DOCUMENT RESUME 

ED 323 197 SP 032 584 



AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 

PUB DATE 
NOTE 
PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



Estes, Gary D.; And otners 

Assessment Component of the California New Teacher 

Project: First Year Report. 

Far West Lab. for Educational Research and 

Development, San Francisco, Calif. 

Office of Educational Research and Improvement (ED), 

Washington, DC. 

Mar 90 

211p. 

Reports - Evaluative/Feasibility (142) 
MF01/PC09 Plus Postage. 

*Beginning Teachers; Elementary Secondary Education; 
^Evaluation Criteria; *E valuation Methods; 
^Measurement Techniques; Teacher Characteristics; 
^Teacher Evaluation 



ABSTRACT 

This assessment component of the California New 
Teacher Project consists of the development and pilot testing of 
innovative forms of new teacher assessment. The evaluation of diverse 
approaches to teacher assessments is intended to identify the most 
promising ways in which a comprehensive assessment of teacher 
candidates could inform the certification process and contribute to 
the quality of teaching. The introduction to this document presents a 
review of li ' erature on new teachers, focusing on the incompleteness 
of preservice training, problems of new teachers, and differences 
between novice and expert teachers. Specific contributions of the 
spring 1989 round of pilot testing are discussed. The purpose of the 
pilot testing was to examine in California the functioni;ig of several 
assessment instruments which are considered to be promising exemplars 
of innovative assessment approaches. The evaluation of the various 
components of these instruments (e.g., logistical requirements, 
prompt materials, scoring criteria, training exercises for assessors) 
was intended to provide information concerning the strengths and 
limitations of the assessment approaches which specific instruments 
represented. This rinal report and analysis of the pilot test 
administration describes each of the instruments, and the ease of 
administration, scoring, content and format, costs, and technical 
qualities. (JD) 



* Reproductions supplied by EDRS are the best that can be ma;3e 

* from the original document. 



^ . 



^5 



'£5> - ^ 

llflf 

WIS 

it. 



ASSESSMENT COMPONENT OF THE 
CALIFORNIA NEW TEACHER PROJECT: 
FIRST YEAR REPORT 





MARCH 1990 



OH^i'ic?^'**'^**^''^ O'' EDUCATION 

EDUCATIONAL RESOURCES INFORMATION 

CENTER (ERIC) 
D This document has beon reproduced as 

recejved from the oflrson or OfQsn»zat»on 

oriflmatjog a 

□ Nl'fwr chanoos have been made to improve 
rep. oduct ton quality 



"PERMISStO^^ TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



• Points of view or opinions stated in this docu- 
ment do not necessarily represent olficial 
OERI positK>n or policy 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CE.^TER (ERIC;." 




ASSESSMENT COMPONENT OF THE 
CALIFORNIA NEW TEACHER PROJECT: 
FIRST YEAR REPORT 



Far West Laboratorv for Educational 
Research and Development 

Gary D. Estes 
Kend^U Stansbury 
Qaudia Long 



With the assistance frcm RMC Research Corporation 

Patricia Wheeler 
Edtys Quellmalz 



March 1990 



ACKNOWLEDGEMENTS 



Both the pilot testing and report writing phases of the project benefited from the 
assistance of many people. Staff from the Commission on Teacher Credentialing and 
the State Department of Education provided timely support and critical information that 
enabled the pilot testing to occur during a limited time period. We also benefited great- 
ly from their review of draft reports. The Project Directors of the support projects as- 
sisted in explaining the importance of the evaluation of the assessments to potential 
participants, recruiting teachers, identifying facilities, and suggesting districts to identify 
additional teachers. Their cooperation and patience made our job much easier. We 
also wish to acknowledge the contributions ot .he teachers who participated in the as- 
sessments, particularly the teacher who traveled over 100 miles on her birthday to take a 
two-hour examination. 



TABLE OF CONTENTS 

ACKNOWLEDGEMENTS i 

CHAPTER 1 : INTRODUCTION 1.1 

Review of Literature on New Teachers 1.2 

Incompleteness of Preservice Training 19 

Problems of New Teachers [ . . . A . 3 

Differences between Novice Teachers and Expert Teachers 1.4 

California New Teacher Project- 1.5 

Assessment Component of the California New Teacher Project 1.6 

Rationale and Design of the Assessment Component i .7 

Pilot Testing in 1989 1.9 

Terminology Used in the Report !^.^..!!!!...!.!!!..!.!.!!.!!!l.lO 

CHAPTER 2: PILOT TEST DESIGN AND ANALYSIS 2.1 

Design of Pilot Tests 2.1 

Sources of Instrumentation 2.1 

Sampling Plan .2^2 

Analysis of Pilot Tests 2.3 

Data Collection 2.3 

Data Reduction !.!...!!!2!4 

Overview of Analytic Categories 2.5 

Administration of Assessment 2.5 

Assessment Content .2^5 

Assessment Format . . .7. . . . . . . . . . 2^6 

Cost Analysis . !!. !. !!!. .. !!.2.6 

Technical Quality !.!.!..!.!!..!!!.!!2!6 

CHAPTER 3: CONNECTICUT COMPETENCY INSTRUMENT (CCI) 3.1 

Administration of Assessment 3.4 

Overview 34 

Logistics !!!3.5 

Security !.!.!.!!!.!3.5 

Assessors and Their Training ..!!......!...!!..!.!3!8 

Scoring !!!..!..\.3.10 

Teacher, Assessor, FWL. and RMC Staff Perceptions 
of Administration 3. 1 1 



5 



Assessment Content 3 j2 

Congruence with California Curriculum Guides and Frameworks 3 13 

iixtcnt ot Coverage of CaUfomia Standards for Beginning Teachers . . 3 15 

job-rclatedness 

Appropriateness for Beginning Teachers ^\ 3*19 

Appropriateness across Contexts r. .7. 3 .20 

Grade level and subject matter 3 20 

Diverse students 3 24 

Fairness across Groups of Teachers 3 24 

Areas of Most/Least Emphasis .V/7.V//.!*7/.!!77^7^^ 

Assessment Format 3 25 

Clarity of the Assessment 3 26 

Clarity of Assessment Materials 3 26 

Observation Feedback 3 28 

Cost Analysis 2 29 

Assessor Time 3 20 

Trainmg Costs a 9q 

Other Costs !7.37/7 3 29 

Summary /.V. . ... 3 30 

Technical Quality ^ 30 

Development a 

Reliability ooV 

VaUdity ''^^''^^^^^^^^ 3 32 

Conclusions and Recommendations 333 

Administration of Assessment 333 

Assessment Content 3 34 

Assessment Format ****** 3*35 

Summary 3 36 

CHAPTER 4: SEMI-STRUCTURED INTERVIEW- 
SECONDARY MATHEMATICS [ 4 1 

Administration of A ssessments 43 

Overview 43 

Logistics 43 

Security 45 

Assessors and ThckTTdining...Z^^^^^^ 

Teacher and Assessor Impressions of Administrction .... . . . . . . . . . . . * . .* * * '4*7 



ERIC 



Scoring 47 

Math Scoring Indicators and Indicator Elements 4.8 

Scoring Process 4,10 

Discussion of Scoring System .!...!...!..!!..!!!!!!.!!...!!!4!ll 

Scorers and Their Training '4] 1 2 

Assessment Content 4 13 

Congruence with California Cumculum Guides and Frameworks 4. 1 3 

Extent of Coverage of California Standards for Beginning Teachers ... .4. 1 5 

Job-relatedness 4 I9 

Appropriateness for Beginning Teachers .4!2 1 

Perceptions 4,21 

Performance on assessment 4.22 

Appropriateness across Contexts 4.22 

Grade level and subject area 4.22 

Diverse students 4.22 

Fairness across Groups of Teachers 4.23 

Appropriateness as a Method of Assessment 4.23 



Assessment Format 4,23 

Clarity of Assessment 4.23 

Timing ...4!24 

Nature of the Tasks 4,24 

Teacher Preferences about Feedback .A25 



Cost Analysis 4,25 

SSI Cost Estimates 4,25 

Technical Quality 4,26 

Development 4.26 

Reliability 4^26 

Validity .!4.28 

Conclusions and Recommendations 4.28 

Administration of Assessment 4.28 

Scoring 4,28 

Assessment Content 4.29 

Assessment Format 4.3O 

Summary , .!....!!!!!!!.!!!4!30 



7 



CHAPTERS: SEMI-STRUCTURED INTERVIEW: 
EI^MENTARY MATHEM^ TICS 



Administration of Assessment 



Overview 



Security !!!!!.!5!4 

Assessors and Their Training !!^...!.!!!!.!!..!!...!.!.!!5.5 

Teacher and Assessor Impressions of Administration 5 .6 

Scoring 5 5 

Scoring Process 5 7 

Discussion of Scoring System . Ws j 

Scorers and Their Training !!!!!!!!!.!!!!!!!!!!!!!!!!!5!8 

Teacher Preferences about Feedback .V/.!lV/.!.'.....5!9 

Assessment Content 5 9 

Congruence with California Curriculum Guides and Frameworks 5.10 

Extent of Coverage of California Standards 

for Beginning Teachers 

Job-relatedness ...5! 15 

Appropriateness for Beginning Teachers ! ! !5! 1 5 

Perceptions 5 I7 

Performance on assessment tasks 5. 1 8 

Comparison of beginning and experienced teachers [5.22 

Appropriateness across Contexts : 5.23 

Grade level and subject matter 5.23 

Diverse students !5!24 

Fairness across Groups of Teachers 5.24 

Appropriateness as a Method of Assessment !5 .25 

Asses«meac Format . 5 27 

Clarity of Assessment 5,27 

Format Features ' ^/ '[ [ [ [ ' ' ' . . 5 ! 29 

liming 5 29 

Choice of tasks ^5 .29 

U s e of pix)b es 5 ! 3 1 

Use of interviewers 5.3 1 



Cost Analysis 



ERLC 



8 



Technical Quality 5 33 

Development 533 

Reliability : **!!!!;!:!*ZZ//.3!!!!;5;34 

Validity 5 34 

Conclusions and Reconraiendations 5.34 

Administration of Assessment 5.34 

Scoring 535 

Assessment Content 5^35 

Assessment Format ..5!37 

Summary 5*37 

CHAPTER 6: EIJEMENTARY EDUCATION EXAMINATION 6.1 

Administration of Assessment 6.2 

Overview 6 2 

Logistics !,!.!!.!!!!!!!!!6!2 

Security ,,,,\,,,,6A 

Assessors [^5 

Scoring V.V.V^^^ 

Teacher, FWL, and RMC Staff Impressions 

of Administration 6.5 

Assessment Content 6.7 

Congruence with California Curriculum Guides and Frameworks 6.8 

Extent of Coverage of California Standards for Beginning Teachers ... .6. 10 

Job-relatedness 6.12 

Appropriateness for Beginning Teachers !6. 1 4 

Appropriateness across Contexts 6! 15 

Grade level and subject area 6.15 

Diverse students 6.16 

Fairness across Groups of Teachers 6. 1 6 

Appropriateness as Method of Assessment .6. 1 6 

Comparison with Other Multiple-Choice Tests 6. 1 8 

Assessment Format 6. 1 8 

Clarity of Oral Overview and Directions 6.18 

Clarity of Items. 6.19 

Timing of Tests! . 6! 1 9 

Feedback .....!6.20 

Cost Analysis 6.20 

Technical Quality 6.2 1 



jc 



B 



Conclusions and Recommendations 6 21 

Administration of Assessment 6 21 

Assessment Content "622 

Assessment Fonnat a'o a 

Summary !!!!!!!!!!!;!;;!!!;;!;!;!!;;;;;;!;;;;6;24 

CHAPTER?: SEMI-STRUCTURED INTERVIEW- 
SECONDARY SOCIAL SCIENCE 7 1 

CHAPTER 8: CONCLUSIONS 8 1 

Assessment Approaches g 1 

Classroom Observation 8 1 

Definition g i 

Characteristics of instruments piloted g*2 

Strengths and weaknesses g*2 

Semi-Structured Interviews g 3 

Definition g 3 

Characteristics of instruments piloted . . . .S.3 

Strengths and weaknesses g.3 

Multiple-Choice Examinations g . 4 

Definition g 4 

Characteristics of instruments piloted . ^8.4 

Strengths and weaknesses 8.4 

Framework for Comparing Differing Assessment Approaches 8.4 

Costs Estimates 8 6 

Next Cceps in Designing a System of Teacher Assessments 8.8 

Future Pilot Tests 8.8 

Issues in Design of an Assessment System . . ! . . . . . . . . 8. 10 



10 



TABLES 

TABLE 3.1 Pilot Test Participants: Connecticut Competency 

Instrument (CCI) 3.6 

TABLE 3.2 Connecticut Competency Instmment: Subjects by 

Grade Levels Observed 3.7 

TABLE 3.3 Congruence of CCI with California Curriculum Guides 

and Frameworks 3. 14 

TABLE 3.4 Extent of Coverage by the CCI of California Standards 

for Beginning Teachers 3. 18 

TABLE 4. 1 Semi-Stmctured Interview in Secondary Mathematics: Pilot 

Test Participants 4.4 

TABLE 4.2 Coverage of the California Mathematics Framework 

by SSI-SM 4.16 

TABLE 4.3 Extent of Coverage by the SSI-SM of California Standards 

for Beginning Teachers 4.20 

TABLE 4.4 Percentage of Rater Exact and Adjacent Agreement 

by Task and Topic Pairs 4.27 

TABLE 5. 1 Semi-Stmctured Interview in Elementary Mathematics: 

Pilot Test Participants 5.3 

TABLE 5.2 Coverage of the California Mathematics Framework 

by SSI-EM 5.12 

TABLE 5.3 Extent of Coverage by the SSI-EM of California 

Standards for Beginning Teachers 5.16 

TABLE 5.4 Semi-Stmctured Interview in Elementary Mathematics: 

Scoring Resulti'^ 5.19 

TABLE 6. 1 Elementary Education Examination: Pilot Test Participants 6. 3 

TABLE 6.2 Elementary Education Examination Spring 1989 

Pilot Test Results 6.6 

TABLE 6.3 Extent of Coverage by the Elementary Education 

Examination of California Standards for Beginning leachers 6. 13 

TABLE 8. 1 Analysis of Alternative Assessment Approaches and 

Their Ability to Assess Specific Teaching Competencies 8.7 

TABLE 1 Mean Item Ratings, Coefficient Alphas, and Inter-Rater 

Correlations for Tests Based on Indicators A.2 



JC 



11 



TABLE 2 Mean Item Ratings, Coefficient Alphas, and Inter-Rater 

Correlations for Tests Based on Consensus Indicators A.3 

TABLE 3 Mean Item Ratings, Coefficient Alphas, and Inter-Rater 
Correlations for Tests Based on Indicators, Rater 
Combinations Reversed A.4 

TABLE 4 Adjusted Coefficient Alphas Based on Indicators, Estimated 

for Equal Test Lengths of 10 Items A.6 

TABLE 5 Mean Item P- Values, Coefficient Alphas, and Inter-Rater 

Correlations for Tests Based on Ldchotomized Indicators A. 7 

CHARTS 

CHART 3.1 Defining Attributes for Two Indicators: Connecticut 

Competency Instrument 3.3 

CHART 3.2 CCI Scoring Results 3.21 

FIGURES 

FIGURE 1 Elementary Education Exam Sample: California Pilot 

Test Analysis Sample, Social Sciences Majors (N=45) C.3 

nGURE2 Elementarj' Education Exam Sample: California Pilot 

Test Analysis Sample, Science Majors (N=13) C.4 

FIGURE 3 Elementary Education Exam Sample: California Pilot 

Test Analysis Sample, Liberal Arts Majors (N=283) C.5 

FIGURE 4 Elementary Education Exam Sample: California Pilot 

Test Analysis Sample, Education Majors (N=20) C.6 

APPENDICES 

APPENDIX A SSI-SM Reliability A.l 

APPENDIX B SSI-SM Scoring Materials B.l 

APPENDIX C Elementary Education Exam Construct Validity C. 1 



12 



INTRODUCnON 



CHAPTER 1: 
INTRODUCTION 



Throughout the nation there is renewed interest in and commitment to educa- 
tional excellence as shown by the many recent analyses of American education (Boyer, 
1983; Goodlad, 1984; President's Commission for Bccellence in Education, 1983) and the 
proposals that have been made for educational reform (Holmes Group, 1986; Shulman, 
1987; Carnegie Corporation, 1986), Although many difterent aspects of the educational 
enterprise have received atf^ntion and suggestions for improvement, there has been 
particular emphasis on the preparation, support, and credentialing of new teachers. This 
emphasis on new teachers has been part of a broader discussion of the further develop- 
ment of teaching as a profession (e.g.. Wise and Darling-Hammond, 1987; Shulman and 
Sykes, 1986), pnncipally through an increased emphasis on improved quality, opportuni- 
ties for professional development, and expansion of career roles for classroom teachers. 

Policy efforts to increase the quality of teachers have concentrated on improved 
methods of assessing teacher performance, particularly among beginning teachers. 
Several leading advocates of educational reiorm have argued that more rigorous and 
comprehensive assessments of teachers* loiowledge and competence should be developed 
and adopted (Holmes Group, 1986; Shulman, 1987; Carnegie Corporation, 1986). Of 
course, efforts to improve teacher quality must also be concerned with maintaining a 
sufficient quantity of teachers. In California, more than 25,000 teacher candidates were 
enrolled in collegiate training programs during 1988-89, costing the state hundreds of 
millions of dollars annually (Gomez, 1989). Unfortunately, up to half of the state's 
beginning teachers leave their classrooms within five years. This high rate of attrition 
compounds the recruitment problems of school districts, and increases the overall cost of 
preparing a sufficient supply of new teachers. 

While some new teachers leave the profession to earn higher incomes in other 
jobs, growing evidence suggests that the high turnover rate among new teachers is also 
due to a lack of support durine the beginmng years of teaching. Many new teachers 
quit because of frustration, isolation, and a sense of inability to meet the increasingly 
complex demands that all teachers face. 

Like most other states, California has many programs for the preparation and 
certification of prosj)ective teachers. Most of these programs are offered by accredited 
colleges and umversities; some are administered by local school districts, often in con- 
junction with post-secondary institutions. In addition, many California teachers are 
trained in other states. Each teacher preparation program in California is evaluated 
periodically by the Commission on Teacher Credentialing, on the basis of program quali- 
ty standards, which are designed to ensure that each candidate has a thorough and effec- 
tive preparation for classroom teaching. Nevertheless, some legislators and other policy 
makers have advocated a more "candidate-based" certification system, in which the 
competence and performance of each candidate would be measured and verified in a 
standardized process. Some teacher advocates have supported the same concept in 
order to add to the professional stature of teaching. Several universities have advocated 
"candidate-based" assessment as a v/ay to replace or reduce the evaluation of campus- 
based programs of teacher preparation. Other teachers and teacher educators have 

1.1 



ERLC 



14 



expressed concerns about the potential effects of a standardized assessment process on 
the attractiveness of teaching as a career, on the diverse composition of teaching as a 
profession, and on the curriculum of teacher preparation. Many policy analysts have 
questioned the costs of a statewide assessnent process, and researchers have wondered 
whether valid, reliable measurements could be done at a relatively low cost level. 

During the last several years, education policymakers in California have discussed 
these teacher induction issues, and are interested in examining the extent to which a 
state policy to support and assess new teachers would: 

0 improve the effectiveness of new teachers; 

0 increase the retention of capable new teachers in the profession; 

0 improve the process of screening teachers' competence as a basis for 
certification; 

0 promote professionalism and commitment to professional development 
among teachers; and 

o contribute to school improvement through greater collegiality and in- 
volvement in induction by experienced teachers. 

Before des^'ribing the pilot study of policy alternatives, the relevant literature will 
be summarized to indicate what is currently known about the characteristics of new 
teachers which might guide their support and assessment. 



Rev^w of Literature on New Teachers 



Although education is usually characterized by diverse opinions and controversy, 
in the past few years a general consensus has been reached regarding the role and 
importance of new teacher support programs in improving the education of youth in the 
United States. The consensus is that such pro-ams should seek to address two major 
concerns: the incompleteness of preservice training and the high departure rates of new 
teachers from the profession. 



Incomplctciicss of Preservice Training 

Effective teachers have mastery of basic concepts in a number of different fields, 
including human development, psychology, sociolo^s philosoohy, communication, and 
the disciplines underlying the subjects they teach. The^' are also familmr with current 
instructional technology, theories of co^ition, and prii.'^iDles of human motivation. 
Effective teachers quickly grasp the philosophy of a school district's curricular goals, and 
translate these goals into classroom instructional activities. Furthermore, they know how 
to adapt instructional strategies to the needs of a variety of learners. 

Clearly, the knowledge base of teaching is very complex. Other professions such 
as medicine, engineering and architecture require a lengthy training period with gradual 

1.2 



is 



increases in responsibilities (Wise and Darling-Hammond, 1987), Under the current 
structure of teacher preparation, prospective leachers are sup'^rvised for a brief period 
of time as they practice applying the concepts and techniques they have learned. They 
then assume full responsibility for teaci ag their own classes with minimal supervision 
and support. Although collegiate study and supervised student teaching are important 
rehearsals, they do not represent the many complex and varied situations a new teacher 
faces in his or her own classroom. The literature on new teachers shows a growing 
realization that support programs for beginning teachers are needed to complete the 
training and studies that prospective teachers experience in colleges and universities 
(Borko, 1986; Clark et al.; McDonald, 1980), When new California teachers in one 
study commented on their teacher preparation experience, they repeatedly noted their 
difficulty in aimMag what they learned in coursework to their present classrooms (Ber- 
liner et al., 1987). roi this reason, teacher educators have recognized for many years 
that preservice courses and experiences cannot fully prepare college students to perform 
as excellent practitioners in classrooms. Regardless of the quality and effectiveness of 
prior study and supervised practice, the acquisition of professional practices in contem- 
porary classrooms requires extended opportunities to reflect on and discuss these prac- 
tices m a collegial environment. 



Problems of New Teachers 

New teachers face problems in three areas: technical, socioemotional, and insti- 
tutional Technical problems are related to content transmission, pedagogy and man- 
agement of the classroom. It is common for new teachers to report significant difficul- 
ties in classroom management rVeenman, 19S4), curriculum implementation (Grant and 
Zeichner, 1981; Veenman, 1984; Berliner et al, 1987), managing diversity within the 
classroom (Grant and Zeichner, 1981; Veenman, 1984; Borko et al, 1986; Berliner et 
al., 1987; Berliner et al, 1988), motivation of students VVeenman, 1984) and relations 
with parents (Veenman, 1984; Berliner et al., 1987). TTie few extant studies of the effec- 
tiveness of teacher support projects (Varah et al, 1986; Huling-Austin, 1988) provide 
some evidence that induction projects can affect the instructional effectiveness of new 
teachers in comparison with cither a control group of teachers who did not receive 
formal support, or the new teachers' effectiveness at the beginning of the support 
project. 

In addition to technical problems, new teachers also report socioemotional prob- 
lems. Most teachers work in isolation from other adults, with few opportunities to 
observe their colleagues (Lortie, 1975). Consequently, they have few chances to com- 
pare their classes and teaching to that of other teachers, or to determine how their 
problems and successes compare with those of other teachers. This lack of information 
and uncertainty magniues the insecurity and self-doubts of new teachers, who face the 
problems of acquiring and developing materials, lesson plans and tests without the expe- 
rience and expertise that seasoned teachers draw upon* At times, the many demands of 
the classroom intrude on new teachers' personal lives. Not surprisingly, they generally 
appreciate someone who is willing to listen to their problems - both personal and pro- 
fessional - and offer supportive and useful feedback (Borko, 1986). 

The last category of problems that new teachers face are institutional ones. 
These include the task of understanding district and school policies, practices, and 
procedures; identifying resources and how to take advantage of them; and becoming a 

L3 



16 



member of the community of teachers in the school. Many new teachers have initial 
difficuhies in locating and absorbing this critical information (Grant and Zeichner, 1981: 
Odell, 1986). 



Differences between Novice Teachers and Expert lechers 

As the induction of teachers into the profession has emerged as a major issue in 
education, researchers have begun to study the differences between novice and expert 
teachers. Although we are only in the initial stages of understanding the development of 
expertise in teachmg, and in identifying the extent of variation among individual teach- 
ers, this knowledge is particularly critical to the design of support programs and assess- 
ments of beginning teachers. As the process of teacher development is better under- 
stood, it is likely that the principles which guide the practices of expert teachers can be 
incorporated into new teacher support programs. If so, perhaps larger numbers of 
beginning teachers would attain higher levels of expertise and effectiveness, which would 
allow more in-depth assessment of particular knowledge and skills. This section summa- 
rizes briefly the literature on differences between novice and expert teachers. 

In comparison with new teachers, when experienced teachers are asked to de- 
scribe their own lesson plans (Leinhardt, 1989) or what they observe in a videotape of 
another classroom's activities (Beriiner, 1989), the experienced teachers provide more 
detailed descriptions, and their descriptions exhibit more cohesive themes. Experienced 
teachers seem to see lessons as composed of general pedagogical routines-routines for 
introducing new concepts, routines for applying concepts previously learned, routines for 
reviewing material previously learned, homework collection routines, and groupwork 
routines. Novice teachers are unlikely to use the language of routines to describe class- 
room activities; indeed, they do not seem to perceive the importance of routines. The 
time spent in routine activities is much more variable for novice teachers than for expe- 
rienced teachers; novice teachers also use a more varied and loosely coupled set of activ- 
ities than experienced teachers. The resuh is that they frequently need to spend time 
familiarizing students with and enforcing rules regarding the activity, reducing efficient 
m the use of classroom time (Leinhardt, 1989). 

This variation ingeneral pedagogical skills is paralleled by variation in the skills 
of content pedagogy. There is less variation in representations of content used by expe- 
rienced teachers than in those used by novice teachers. Ejqjenenced teachers are more 
likely to use the same representation of content (e.g., a number line) for a series of 
lessons, while novice teachers often use unfamiliar representations of content to intro- 
duce new concepts (Leinhardt, 1989). The practice of the novice teachers results in 
more confusion by students who needed to learn a new form of content as well as a new 
concept The explanations of concepts by experienced teachers were also more concise, 
highlighting prerequisite skills or concepts that the students already had learned (Lein- 
hardt, 1989). ^ 

The content knowledge of expert and novice teachers also differs in a similar 
fashion. Expert teachers see the subject as organized in frameworks; novice teachers 
are more likely to see it as ? collection of facts (Wilson, 1988; cf. Leinhardt, 1989). The 
extent to which this organization of content knowledge is likely to develop with class- 
room experience is not clear. It seems plausible that a certain level of content knowl- 
edge is prerequisite to conceptualizing the subject matter in terms of frameworks. 



1.4 



Expert teachers also are better able to articulate a knowledge of students and 
studeiic learning than either novice teachers (Leinhardt, 1983; Wilson, 1988) or subject 
matter specialists whose major focus is content and not teaching (Wilson, 1988). This 
includes generic knowledge, subject-specific knowledge, and topic-specific knowledge 
about students, which experienced teachers formulate into frameworks which guide both 
general and individual instruction. 

These recent studies of novice and expert teachers suggest there is much that new 
teachers need to learn in order to become proficient classroom practitioners. As was 
suggested previously, however, many individuals cannot learn all of the complexities of 
teaching in preservice training programs or supervised practicums. An extended process 
of intensive consultation and mentoring is needed for beginning teachers to acquire the 
skills and knowledge that constitute expertise in pedagogy. 



California New Teacher Project 



To explore innovative methods of new teacher support and assessment, the Cali- 
fornia Legislature, in the Teacher Credentialing Law of 1988 (Chapter 1355 of the Stat- 
utes of 1988k created the California New Teacher Project (CNTP). The CNTP, jointly 
administered by the Commission on Teacher Credentialing (CTC) and the State De- 
partment of Education (SDE), has three components: support, evaluation, and assess- 
ment. A brief overview of each component and the overall goals of the CNTP are 
found in this section; the assessment component is described in more detail in the fol- 
lowing section. 

During the first year (1988-89), the support component of the CNTP cc^^-isted of 
fifteen local pilot projects representing diverse teaching contexts as well as a ty c ' 
approaches to supporting new teachers. Approximately 650 first- and seconu-^ ^r 
teachers participated in training or seminars sponsored by districts and institutions of 
higher education, worked with mentors and other experienced teachers, and met with 
peer support groups. Support projects funded by the California New Teacher Project 
differ in their areas of emphasis ani^ Methods of delivery, but they collectively address 
the technical, socioemotional, and institutional problems of new teachers. It should be 
noted that these projects are not the only new teacher suppjrt programs in California; 
others are sponsored by individual school districts or jointly by the SDE and the Califor- 
nia State University system. However, these fifteen projects have agreed to participate 
in research on alternative methods of new teacher support that is sponsored by the 
CNTP. 

The CNTP evaluation component is investigating the effects of the support on 
new teacher effectiveness and retention, as well as the cost-effectiveness of the various 
methods used to support new teachers in the fifteen projects. Experimentation with 
alternative methods of teacher support combined with the evaluation of these forms of 
support, should help to identify the kinds of assistance that are most effective as new 
teachers enter the profession. The CTC and SDE have contracted with the Southwest 
Regional Laboratory (SWRL) to conduct all acuvities in the evaluation component. The 
results of this evaluation are being reported to the CTC and SDE in a separate report 
by SWRL 

1.5 



ERIC 



Assessment Cosnponent of the California New Teacher Project 

The assessment component of the CNTP consists of the development and pilot 
testmg oi innovative forms of new teacher assessment. The evaluation of diverse ap- 
. ' proaches to teacher assessments is intended to identify the most promising ways in which 
a comprehensive assessment of teacher candidates could inform the certification process 

V and contribute to the quality of teaching. This document reports the analysis of the 

V pilot tests that were conducted in the assessment component in i-?89, the first year of 
Lthe CNTP. The pilot tests were administered by Far West Laboratory for Educational 

Research and Development (FWL), with assistance in the design, observation, and 
analysis of the pilot tests fi-om RMC Research Corporation. Tlie design and purpose of 
these pilot tests are described in Chapter 2. During 1989, the assessment component 
also included the development of five additional assessments that will be pilot tested in 
the second year of the project 

The Bergeson Act (S.B. 148) specifically requires that each alternative method of 
support and assessment be evaluated in terms of: 

0 its effectiveness at retaining capable beginning teachers in the profes- 
sion; 

0 its effectiveness at improving the pedagogical content knowledge and 
skills of the beginning teachers who are retained; 

o its effectiveness at improving the ability of beginning teachers to teach 
students who are ethnically, culturally, economically, academically, and 
linguistically diverse; 

0 its effectiveness at identifying beginning teachers who need additional 
assistance and, if that additional assistance fails, who should be removed 
from the profession of education; 

0 the relative costs of the method in relation to its beneficial effects; and 

0 the extent to which an alternative method of supporting or assessing 
beginning teachers would, if it were added to the other state require- 
ments for teaching credentials, make careers in education more or less 
appealing to prospective teachers. 

Although the support and assessment components are guided by relevant srate 
curriculum frameworks and expectations for the pedagogical competence of new teach- 
ers, the SDE and CTC have not generated a list of competencies to serve as a common 
focus for all components of the CNTP. Instead, to increase the variety of methods 
being evaluated, the assessment component is conducted independently of the evaluation 
and support components. For this reason, the competencies being measured by the 
assessment instruments being piloted may or may not coincide with the areas of support 
offered to the new teachers by their support projects. The integration of lessons learned 
from the evaluation and assessment components will facilitate an analysis of relationships 
and interactions among teacher preparation, support, assessment and certification to 
suggest whether and how a program of support and assessment for new teachers should 
be developed in a coordinated manner. 

1.6 



J.9 



Since the purpose of this document is to describe and analyze the pilot testing for 
the assessment component, the rationale and design for this component are described in 
more detail in the following section. 



Rationale and Design of the Assessment G)mpoQent 

As a result of educational reform efforts which focus on the development of 
teaching as a profession, states are moving to candidaie-based assessments to supple- 
ment their existing program-based modes of assessment. To enhance the academic abili- 
ties of teacher cancfidates, there is a trend toward setting higher standards for teacher 
preparation programs, increasing the requirements for matnculation, and specifying the 
competencies to be mastered before the completion of programs. States are also adopt- 
ing new assessments whose passage by teacher candidates is required for credentialing. 

Like other states, California is particularly concerned about maximizing the quali- 
ty of teaching in its schools. The CTC has recently revised the Standards of Program 
Quality and Effectiveness rCTC, 1988) for teacher preparation programs. The Cfommis- 
sion's new standards include a definition of the levels of pedagogical competence and 
performance expected of program graduates. California has also participated in the 
movement to evaluate individual teacher candidates through the use of particular in- 
struments that assess teacher competence: the California Basic Educational Skills Test 
(CBEST), the NTE Core Battery, and the NTE Specialty Area Tests. In recent years, 
these tests have been reviewed by California teachers and teacher educators in terms of 
their appropriateness for use in the credentialing process (Watkins, 1985; Wheeler and 
Elias, 1983; Wheeler et al., 1988). Suggested changes in the fifteen NTE Specialty Area 
Tests, for example, included revision of the content to make the tests more compatible 
with the California Curriculum Frameworks, and augmentation of these multiple -choice 
tests with some type of performance assessment. These changes are currently being 
implemented by the CTC in consultation with the State Superintendent and Educational 
Testing Service. 

Nationally, the interest in assessing the quality of teachers has underscored the 
absence of assessment approaches that are closely related to the tasks that teachers 
perform in the course of their work. This has led to the development of alternatives to 
multiple-choice tests, which historically have hcQn the dominant form of large-scale 
teacher assessments. The alternatives are of.en referred to as "innovative" or perform- 
ance-based assessments because of their emphasis on direct measurement of actual 
teacher performance. 

A variety of performance-based teacher assessments has been developed in recent 
years, includmg a number of observation instruments which have been adopted as re- 
quirements in teacher certification programs in other states. However, many of these 
instruments are quite prescriptive in terms of teaching style. Most of them simply 
measure the frequency of specific behaviors that are generally associated with student 
achievement, rather than assessing the appropriateness of such behaviors when they 
occur in particular situations. The Bergeson Act specifically prohibits the use of check- 
lists of teacher behaviors which tabulate the presence or absence of discrete behaviors. 
Since California classrooms are extremely diverse, instruments which do not fairly assess 
a varietjr of teaching styles in diverse contexts are inappropriate for use in assessing 
California teachers. For this reason, the CNTP is designed to evaluate the degree to 

1.7 



20 



which varipus assessment approaches measure the ability of teacher candidates to teach 
a wide vanety of students. 

The Bergcson Act reflects an emerging design for California's assessment of 
teacher candidates in four areas: basic academic skills; subject matter knowledge; 
subject specific pedagogy; and general pedagogy. The CBEST has been judged to be 
suitable for assessing candidate performance in the first area (Watkins, 1986) Two 
current projects that address the second area are: the development of a replacement 
test for the NTE Core Battery Test for the assessment of subject matter knowledge of 
prospective elementary teachers; and the revision and augmentation of the NTE Special- 
ty Area Tests for assessing subject matter knowledge of prospective secondary teachers, 
ihe last two areas were judged to be best assessed after teacher candidates have some 
S?^!r^t^l?!^ r" '^^"^"cting their own clarjooms, i.e., in the first year or two of teaching. 
1 he LIN IP focuses on the identification of promising, cost-effective methods of assess- 
ment ot the last two areas, with special emphasis on six content areas: Elementary 
leaching, Secondary English, Secondary Mathematics, Secondary Life Science, Second- 
ary Physical Science, and Secondary Social Science. 

Because of the high interest in teacher assessment among education professionals 
m recent years, together with a growing recognition of the limitations of multiple-choice 
torms ot assessment, new assessment approaches are being developed. These new 
approaches include the evaluation of tasks that resemble those which teachers commonly 
pertorm in the classroom. Assessment approaches include the use of videotapes or 
wntten vignettes, structured interviews, structured simulations, and reviews of portfolios 
ot a teachers work. Classroom observation instruments, which assess teachers in the 
course of instruction in their own classrooms, are also being revised and refined. 

r-MTD ^" the research to be conducted in the assessment component of the 

statt from the CTC and SDE considered both the high cost of assessment devel- 
opment and the desirability of evaluating a wide variety of assessment approaches. 
Many innovative assessment instruments are in the initial stages of development, and 
could seive only as initial prototypes for exploring the potential of an assessment ap- 
proacn, rather tnan as state-of-the-art instruments reflecting a long period of experimen- 
ts ,ijn withm the approach, o Of F 

To niaximize the information to be gathered while minimizing the costs, the in- 
strume Its chosen for pilot testing in the initial year (and the instrumems to be de- 
veloped in the subsequent years) were not required to be fully developed products 
whose vahdi^r and reliability were well established. Instead, the pilot testing was de- 
signed to yield information about the strengths and weaknesses of assessment approach- 
es tor which the specific instruments serve as outstanding exemplars. The purpose of 
tne pilot testmg is not to consider particular instruments for adoption, but to identify 
promising approaches to the assessment of teachers, to guide future selection and/or 
development of assessment instruments which are tailored to the California context, 
L-onsistem with this purpose, assessment prototypes were piloted on a small scale with a 
thorouah trouble-shooting process in order to learn as much as possible about the ef- 
lects ot each approach before conducting expensive, large-scale fiela tests. 

A crS' ^"I^r^ broadest possible representation of assessment approaches, CTC 
and bDh staff began with a review of existing teacher assessment instruments. They 
hoped to avoid as much as possible the high costs of initial development by pilot testing 

1.8 



^1 



existing instruments. However, the state agencies were able to locate and identify only a 
few existing assessment instruments that employ innovative modes of assessment, or that 
assess sigmficant domains of teacher competence that have not been assessed adequately 
in the past These instruments include: a classroom observation instrument that assesses 
general pedagogy; three semi-structured interviews in elementary mathematics, secondary 
mathematics, and secondary social studies which assess subject-specific pedagogy; and a 
multiple-choice examination that utilizes innovative questions and materials to assess 
general pedagogy and content-specific pedagogy in elementary education, I'he 
CTC/SdE staff chose to pilot test these instruments in the initial year of the CNTP, and 
to commission the development of additional instruments in other areas which had been 
insufficiently explored These additional instruments are slated for pilot testing in the 
second and third years of the project. 

A comprehensive teacher assessment system for California cannot be developed 
quickly- For example, classroom observation instruments should reflect the complexities 
of student-teacher interactions, instructional decisions, and student involvement. Most 
performance-based assessments that would capture these complexities are in initial 
stages of development, and would need to be tailored to the California curriculum and 
diverse teaching contexts. During the next two years, the experimental work to be 
undertaken by the CNTP will provide insights into the kinds of assessments that would 
be most cost-effective, when and how those assessments should be administered, and 
how educational groups and organizations can best assist prospective and novice teach- 
ers in environments that feature assistance and accountability. Although the main 
purpose of the assessment component is to evaluate assessment approaches for use in 
credentialing teacher candidates, their capacity to advise teacher candidates of their 
strengths and weaknesses, and to guide the choice of staff development or induction 
activities, will also be considered. 

The specific contributions of the Spring 1989 round of pilot testing are discussed 
in the following section. 



Pflot Testing in 1989 



The purpose of the pilot testing in 1989 was to examine in California the func- 
tioning of several assessment instruments which are considered to be promising exem- 
plars of innovative assessment approaches. The evaluation of the various components 
(e.g., logistical requirements, prompt materials, scoring criteria, training exercises for 
assessors) of these instruments was intended to provide information concerning the 
strengths and limitations of the assessment approaches which the specific instruments 
represented. The pilot tests were not expected to yield definitive measurements of the 
psychometric properties of the instruments because the prototypes had not been suffi- 
ciently developed for that to occur. This focus on trouble-shooting allows small-scale 
pilot testing, requires fewer resources, and considerably increases the number of assess- 
ment approaches which can be examined. The goal of the pilot tests is to suggest 
whether or not it is advisable to invest additional resources in the development of as- 
sessments resembling those piloted. 

This document is the final report and analysis of the Spring 1989 pilot test admin- 
istration. Each of the assessment instruments is described, and the ease of administra- 

1.9 

22 



tion, scoring, content and format, costs, and technical qualities are analyzed. The admin- 
istration of thfc pilot tests was described in detail in the Administration Report for 
Spring, 1989. ^ 



Terminology Used in the Report 



Specialized terms and abbreviations which appear in this report are defined 

below: 

Assessment the process of measuring the performances of new teachers in order 
to help them improve, and to determine whether their performances 
satisfy one or more standards for professional certification as classroom 
teachers. 

Assessor the person who administers an assessment instrument. 

Candidate: a person participating in an assessment for the purpose of satisfying 
requirements frr earning a teaching credential. 

CCh Connecticut Competency Instrument A classroom observation instrument 
developed by the Connecticut State Department of Education. 

CNTP: the California New Teacher Project, which evaluates methods of new 
teacher support and assessment The project has three components: 
sponsorship of new teacher support projects (which numbered 15 at 
the time of the Spring 1989 pilot testing); evaluation of various meth- 
ods of teacher support exemplified by these support projects; and pilot 
testing of innovative assessments of new teachers. 

CrO the California Commission on Teacher Credentialing. The CTC 
staff shares responsibility for overseeing the California New Teacher 
Project 

FWL: Far West Laboratory for Educational Research and Development. 
.'^WL is administering the assessment portion of the California New 
Teacher Project and analyzing the potential of alternative assessment 
approaches for possible future use as new credentialing requirements. 

lOX Assessment Asscxsates: developers of the Elementary Education 
Examination which was piloted as an example of an innovative form of 
the multiple-choice test approach to assessment 

Project one of the fifteen support projects in the California New Teacher 
Project sponsored by the CTC/SDE. 

Project Director a director of one of the fifteen projects. 



1.10 

23 



SDE- the California State Department of Education. Tlie SDE staff 
ajninisters the California New Teacher Project jointly with the CTC. 
Often the two will be referred to jointly as the CTC/SDE. 

RMC RMC Research Corporation. RMC staff are collaborating with 
FWL staff in the design and analysis of the pilot tests. 

SWRL: Southwest Regional Laboratory. SWRL is conducting the evalua- 
tion of new teacher support methods exemplified by the CNTP 
projects. 

TAP: the Stanford Teacher Assessment Project, which develops proto- 
types of assessments to be used to certify expert teachers. 

Teacher first- and second-year teachers with California teaching creden- 
tials. 

The next chapter describes the pilot test design and the processes used to evalu- 
ate the assessment approaches which were examined in the spring of 1989. The next 
chapters discuss the pilot tests of specific instruments in the allowing order: the 
Connecticut Competency Instrument (CCI), the Semi-Structured Interview in 
i>econdary Mathematics, the Semi-Structured Interview in Elementary Mathemat- 
ics, and the Elementary Education Examination. Reasons for postponing the 
pilot test of the Semi-Structured Interview in Secondary Social Science are also 
discussed. The report concludes with a summary of general lessons learned about 
performance assessments and recommendations for next steps. 



1.11 



24 



CHAPTERS 
PILOT TEST DESIGN AND ANALYSIS 



25 



CHASTER!: 
PILOT TEST DESIGN AND ANALYSIS 



This chapter describes the design and analysis of the pilot tests of prototypes 
representing various assessment approaches. Subtopics include the source of instrumen- 
tation, the sampling plans, sources of information for evaluating the instruments and the 
assessment approaches, methods of data reduction and major categories of analysis. 
Deviations from the design due to unanticipated events will be described in following 
chapters which focus on the individual instruments. The analysis of the pilot tests is 
contained in two reports. The first report, Administration Report for Spring, 1989, 
descnbes the administrative aspects of the pilot tests of the different assessment instru- 
ments and discusses teacher responses. This final report focuses more on the content 
and evaluation of the prototypes, and recommends next steps for the pilot testing of 
additional prototypes. r f e 



Design of Pflot Tests 



This section on the design of the pilot tests describes the sources of instruments 
and the sampling plans. Procedures for data collection and analysis will be described in 
the section on analysis of the pilot tests. 



Sources of Instrumentation 



The four instruments were selected after an extensive search by CTC and SDE 
statt for pronusmg prototypes of innovative assessment formats. The sources of the 
instruments vaned, so each will be described separately. 

The Connecticut Competency Instrument (CCI), a classroom observation instru- 
ment that measures general teaching effectiveness in elementary and secondary schools, 
was developed by the Connecticut State Department of Education. Observers who had 
previously been trained in the use of the CCI by the State of Connecticut were used to 
conduce the observations. 

The semi-structured interviews came from two different sources. The Semi-Struc- 
tured Interview m Secondary Mathematics (SSI-SM) was develqptu by the State of 
Connecticut, which provided previously trained assessors from Connecticut to administer 
the assessment The scoring system was in the process of development at the time of 
administration; substantial proa-ess in development was made, and Dortions of the inter- 
views wer-. scored. The Semi-Structured Interviews in Elementary Mathematics and in 
Secondary Social Studies (SSI-EM and SSI-SSS respectively) were developed by the 
Teacher Assessment Project (TAP) at Stanford University as part of their work with the 
National Board for Professional Teaching Standards. The interviews from Stanford were 
ongmaUy developed to identify expert teachers, so the questions and scoring system were 



11 



J^hi?^n^ appropriate for beanning teachers. No trained assessors were avail- 

oole, so a lAf representative trained the assessors and scorers for all Stanford instru- 



ments. 



• * Elementary Education Examination, a multiple-choice test, was designed for 
begmning teachers bv lOX Assessment .Associates (formerly called the Instructional 
Objectives Exchange) under a contract with the State of Connecticut Although we refer 
to .£ as a test, it is actually a collection of items placed into six test forms. These 
Items were pilot tested to assess their feasibility for incorporation into a test of compe- 
tence m elernenlaiy education which includes both pedagogy and content knowledge^ 
iOX provided aU matenals and assumed fuU responsibility for administering and scoring 



Saropfing Plan 

Several factors constrained the construction of the pilot test sampling plan. The 
tl!SJT. necessity of planmng, scheduling am administering five assessments within a 
nilnt rSSiSc^r? t Pf 'T""^ desL'ability of using tie teachers in the fifteen 

pilot projects who had already consented to participate in assessment pilot testing. 

un-fhJn sample selection process by assembling lists of possible participants 

^h^^i . ^T'^t ^t®^*^ ^^^^ '^^''^ completed, the characteristics of grade level, 
SS..t- u ^"''1^"' suburban and rural), gender and ethnicity were considered in 
selecting teachers from those projects with a suitable concentration of teachers with the 
appropnate credential. (The threshold number varied with the particular assessment 
instrument being piloted, rangmg from eight for the semi-structured interviews to thirty 
^111?^ ^ ''^°'^ examination.) For example, for the SSI-EM, lists of secondary 
J? AUK If "^'^ assembled, and projects witfi more than eight teachers were contact- 
ed. Althoii^ we wanted to maximize variation in the characteristics of teachers select- 
ed, our abUity to do so was hmited by the information which we had about project 
teachers the time required to recruit nonproject teachers, and the small samples. 

ethnicity of teachers was available for many of the projects, but there 
were few nonwhite teachers, precludmg the selection of a significantly large subsample. 
tH^L'";T?fii°!! °" s'^I'oo^ context was minimal, based solely on classifications of dis- 
trict provided by the New Teacher Support Projects. We also tried to include teachers 
trom each project in at least one pilot test, though no attempt was made to equalize the 
participation rate across projects. 

Con«derations of administration costs and time constraints led us to not include 
some project teac-ci-s jom remote areas. In the case of the classroom observation 
assessment and one of the semi-structured interview assessments, the use of Connecticut 
assessors who were only available for ? -jjecific week limited flexibility in selecting 
ttachers because of constramts on tim^ tor travel to multiple sites. To complete the 
°" schedule, the recruitment of nonproject teachers was limited to those 
witn the appropnate credential who were located near an identified sample of project 
teachers. Most nonproject districts could identify teachers in their first year in the dis- 
trict, but could not readily determine whether these teachers were in their first year of 
^eacmng. Some nonproject districts contacted had a time-consuming approval process 
equired for the release of teachers' names. Therefore, the use of nonproject teachers 
was minimized. 

2.2 



ERIC 



27 



The characteristics of teachers in the samples are described in more detail in the 
chapters that focus on specific instruments. 

Analysis of Pilot Tests 

This section describes our procedures for data collection and reduction, as well as 
the key analytic categories focusing on specific aspects of instruments. The data collect- 
ed also served as a basis forjudging the potential of the assessment approach which the 
particular instrument utilized. 

Data Collection 

°^ ^^^^ collection were used for all assessment instruments, 
tliey wiU be discu';sed together. Several sources of data were used: 

o evaluation feedback forms for teachers who participated in the pilot 
tests ^ 

0 evaluation feedback forms for the assessors and scorers; 

o observations of the administration of each assessment and the training 
of assessors and scorers recorded in field notes by FWL and RMC staff; 

0 scores that reflected the performances of participating teachers on the 
assessment instruments; and 

0 the relevant Curriculum Guide(s) or Framework(s) and the California 
Standards for Begiiming Teachers. 

Following guidance fi-om the funding agencies, RMC staff developed an outline of 
issues to be addressed m evaluation feedback forms to be completed by participating 
teachers, assessors, and scorers. FWL staff then developed separate forms for each 
group which were tailored to specific assessment instruments. These forms were given 
to teachers upon the completion of each assessment, except in the case of the classroom 
opseryation mstrument, where they were mailed. Assessors and scorers returned com- 
pleted forms when they preserved invoices for payment Since the emphasis in the pilot 
tests was on trouble shootmg, the evaluation feedback forms focused on critical evalua- 
tions ot the instruments with respect to the analytic categories described later. Most of 
to elatora°e^ ^^"^ ^'*^^'* °P®""®"'^®'^ °^ required yes/no answers with spaces provided 

mm nx"x^®^ ^""^ ^"""8 observations of the assessment administrations. 
hnn staff attended most administrations of the assessment instruments, 

h WL staff observed the training of assessors and scorers. When they were familiar with 
tne subject matter, FWL staff also served as participant observers for scoring to obtain a 
more complete understanding of the performance of the assessment instruments. RMC 
statt served as participant observers for the classroom observation instrument. FWL 

2.3 



28 



fj^^^so^participated as assessors for some administrations of one of the semi-structured 



c*, • u!^ the mstruments were scored; the interpretation of results was not always 
straightfomard because sconng systems varied in terms of the stage or level of devel- 
opment For example, the classroom observation instrument had a well-developed 
sconng system which had been previously piloted and revised to produce greater inter- 
rater reUabmty. In contrast, the scoring systems for the semi-structured interviews were 
devised or revised after administration of the pOot tests, and hence were unknown to the 
interviewers, creatmg some mconsistencies between questions and^- probes and the 
th°""^tn^^°"^^* Training for scoring varied according to the level of development of 



x>r ^ 1 1? ?ontent of each prototype was compared to aU of the relevant California 
Model Cumculum Guides and Frameworks, and wiift the California Standards for 
begmmng Teachers. The Model Cumculum Guides and Frameworks are recent 
documente produced by subject matter panels convened by the California State Depart- 
ment of Education. Reflecting a consensus among panel members on the content and 
pniiosophy of instruction, these documents are expected to guide curriculum develop- 
ment and instruction m the subject in California public schools. If there were two or 
more Guides or Frameworks addressing a particular subject area, the most recent one 



The Califorma Beginmng Teacher Standards are standards that define the level of 
pedagogical competence and performance that the Commission on Teacher Credential- 
ing expects the graduates of credential programs to attain as a condition for program 
??.Fi?^ J^^ standards-standards 22 through 32-are listed in Standards of Program 
Quality and Effectiveness, Factors to Consider and Preconditions in the Evaluation of 
FroBessional Teacher Preparation Programs for Multiple and Single Subject Credentials. 
(Uther standards address more general program requirements; these focus specifically on 
candidate competencies.) Although these are standards for teacher preparation programs 
and not teacher candidates, they identify the knowledge and skills that beginning cJli- 
fomia teachers are expected to attain. 

Data Reduction 

Data reduction techniques varied with the data collection method. Fixed-re- 
sponse questions on the evaluation feedback forms completed by teachers participating 
m the pilot tests, assessors and scorers were tabulated by hand. Most of the questions, 
nowever, were open-ended. Surveys were reviewed, and response categories were 
developed to code the open-ended responses arid comments. In addition, responses 
wtiicti either stated a common viewpoint well, or which provided an additional perspec- 
tive, were culled for possible quotation in the reports. For the fixed-response questions 
wnere elaboration was invited, positive responses were less likely to be elaborated than 
negative ones, so many more negative evaluations were available for quotation than 
positive ones. ^ 

Field notes were reviewed for relevant information that address the analytic 
categones and were incorporated into the chapters about specific instruments. 

2.4 

29 



For the multole-choice examination, there was a large enough sample to permit 
extensive anahrsis of scores by subgroups. For other assessment instruments, the general 
distribution of scores was examined; in some cases, the scores of teachers from nonwhite 
ethnic OTOups were examined separately. Some exploratory analyses were performed to 
assess the internal consistency and rater agreement on the secondary math interviews. 

• u J^^ Model Curriculum Guides and Frameworks were examined by FWL staff 
with the appropriate subject matter background. Their professional judgments were 
used to draw conclusions about the congruence of the assessment instruments with the 
relevant Guide or Framework. The reasoning underlying these judgments are described 
in detail m the chapters on the specific protoQrpes. 



Overview of Anafydc Categories 

The same general analytic categories were used to appraise all assessment in- 
struments. They included: administration, content, format, cost analysis, and technical 
quahty. These categories and their subcategories are discussed below. 

AdmiuisUation of assessment This category included consideration of the logis- 
tics, security needs, and training of assessors and scorers for the particular assessir'^nt 
instrument. GeneraUy, this categoiy generated information required to estimate aaminis- 
trative requirements and cost projections. The logistics required for administration 
predict the ease of administration if the assessment approach were to be implemented 
on a statewide basis. Generally, the more complicated the logistical requirements, the 
more expensive the assessment is to administer. The needs for security impact not only 
logistical requirements, but also the frequency with which the instrument must be revised 
tor statewide administration. Consideration of the training of assessors and scorers 
suggests the degree of difficulty to be anticipated in recruiting people with the required 
professional expertise, and the time required to prepare personnel to administer and 
score the particular assessment instrument 

Assessment content This categoiy addressed the specific instrument's congruence 
with the relevant Cumculum Guide or Framework, and the extent to which the Califor- 
nia Standards for Beginning Teachers were covered. It also included an examination of 
the content of the assessments along the following dimensions: job-relatedness, appro- 
priateness for beginmng teachers, appropriateness across varying teaching contexts, 
taimess across different groups of teachers, and general appropriateness of the assess- 
ment approach represented by the prototype as a method of assessing teachers. Since 
none of the assessment prototypes was specifically developed for use in California, com- 
panson of the assessment content with the relevant Curriculum Guide and the California 
standards for Beginning Teachers was necessary to determine whether the assessment 
approach was compatible with the instructional philosophy underlying the various Call- 
forma curricula and the competencies specified for teacher candidates. Since one 
common criticism of teacher assessment instruments is that scores have not been shown 
to be closely related to specific teaching competencies, job relevance was included as an 
analytic category. The more closely the assessment tasks resemble the activities that 
teachers do in the course of their teaching duties, the higher the potential relationship of 
scores to actual teaching competencies. 



2.5 

30 



Since ine i^riTP focuses on the assessment of teachers early in their teaching 
career, it is important to judge the appropriateness of each assessment in terms of 
performance emcctations and perceived difficulty for teachers at this stage of career 
development Appropnateness across contexts is particularly important for California, 
since It has a wide diversity in student populations. The issue of fairness across groups 
of teachers relates to the potential for bias with regard to any particular group of teach- 

Asse^ment format This categoiy included the general clarity of orientation 
matenals and mstructions, as well as the identification of important features peculiar to 
a particular assessment format In order for the performance of candidates to reflect 
tneir true competencies, it is essential that each candidate have clear and accurate 
expectations of the performance which is expected of them. This is not possible when 
teachers are uncertain as to what they are being asked to do. This category also covers 
features which are pecuhar to particular assessment formats identified as either prob- 
lematic or cntical to successful implementation of the assessment approach. 

Cost analysis. Based on the pilot testing experience, we attempted to project the 
costs of a statewide admmistration of an instrument which resembled the prototyjie 

Technical quality. This categoiy discussed the work performed to date in the 
development of the prototype. Although few data were available to assess the reliability 
and vahdity of any instrument, procedures for doing so were recommended. 

This chapter has outlined the general design for the Spring 1989 pilot tests in the 
assessment portion of the California New Teacher Project The following five chapters 
discuss each of the assessment instruments: the classroom observation instrument 
(Connecticut Comoetency Instrument or CCI), a semi-structured interview in secondary 
mathematics (SSI-SM), a semi-structured interview in elementary mathematics fSSI-EM). 
and an innovative multiple-choice test (Elementary Education Bcamination). 



2.6 



31 



CONNECnCUT COMPETENCY INSTRUMENT (CO) 



32 



CHAPTERS: 

OONNECnCUT CDMPEIENCY INSTRUMENT (CO) 



The Connecticut Competency Instrument (CCI) is a classroom observation system 
developed by Connecticut's State Department of Education. Through this system an 
observer conducts a 45-60 minute classroom observation, focusing on ten indicators of a 
teacher's classroom performance. These 10 indicators, grouped m three clusters to 
represent three major areas of instruction, are as follows: 

L Management of the Classroom Environment 

a. Promotinjg a positive learning environment 

b. Maintaimng appropriate standards of behavior 

c. Engaging students m activities of the lesson 

d. Effectively managing routines and transitions 

n. Instruction 

a. Creating a structure for learning 

b. Presentmg appropriate lesson content 

c. Developing a lesson to promote achievement of lesson objectives 

d. Using appropriate questioning techniques 

e. Communicating clearly 

m Assessment 

a. Monitoring student understanding and adjusting teaching 

In addition to the observation which focuses on the above ten indicators, the CCI 
system includes a pre-assessment information form, completed by the teacher, which 
informs the observer of the learning objectives, activities, instructional arrangements, and 
materials associated with the lesson. Next there is a pre-observation interview in which 
the observer meets with the teacher to review the information included in the pre-as- 
sessment information form. Finally, there is a post-observation interview in which the 
teacher meets briefly with the observer to explain any deviations from the plan that may 
have occurred during the lesson. 

A key feature of the CCI is the analysis and rating process. After scripting what 
takes place in the classroom as accurately as possible, the observer completes a one- 
page form for each of the ten indicators. In one column of the form, the observer 
writes evidence from the script that supports the indicator, and in another column she/he 
records evidence that does not The recorded evidence is specifically tailored to one or 
more of the attributes that ( ^fine each of the indicators. For example, for the first 
indicator, "promoting a positive learning environment,*' the observer records positive and 
negative (if^ any) evidence from the script for each of three defining attributes: rapport, 
communication of expectations for achievement, and physical environment. Each of 
these attributes is also defined so the observer would record evidence that, for example, 
the teacher has or has not established rapport by "demonstrating patience, acceptance, 

3.1 



33 



empathy and interest in students through positive verbal and non-verbal exchanges." 
After the careful recording of evidence, the observer then weighs ihe evidence in both 
columns in order to rate the teacher's performance on the indicator as either "Accept- 
able" or "Unacceptable." 

For each of the ten indicators, the CCI includes one or more attributes that 
elaborate on the meanmg of the indicators. Chart 3.1 shows the defining attributes for 
two of the ten mdicators. Connecticut has developed operational definitions for all CCI 
mdicators and attributes. These definitions are an important part of the training of CCI 
observers. 

T^u ^^^j^ ^ typical classroom observation system, but could be considered a 
state-of-the-art representative of the classroom observation approach to teacher as- 
sessment Its attention to specific evidence regarding teaching abilities distinguishes the 
CU from most other classroom observation instruments which generally tend to be 
either structured checklists or rating-scale instruments. It is further distinguished from 
most other classroom observation instruments in that it (1) acknowledges that competent 
teachmg may be manifested in diverse ways, and (2) emphasizes the importance of the 
professional judgment of trained assessors in making decisions about teacher compe- 

To better understand why the CCI was chosen for pilot testing in California, we 
can comoare the CCI to classroom observation instruments that are used in two other 
states: Florida and Georgia. Two classroom observation instruments - a Summative 
Observation Instrument and a Formative Observation Instrument - are used in Florida's 
Beginning Teacher Program. Both require an observer to mark in a box whenever a 
specified behavior is observed. The behaviors (also referred to as indicators) are organ- 
ized under six domains and are described as dichotomous pairs. For example, the first 
behavior for the Summative Observation Instrument is the way a teacher begins instruc- 
tion. The observer evaluates the teacher's initiation of the lesson and selects either the 
box marked 'promptly" or the box marked "delays," with no intermediate evaluation 
possible. In a sixty-minute period, the observer is to mark the frequency of twenty-one 
behaviors, all but four of which are described in dichotomous terms. This instrument, 
therefore, emphasizes the occurrence of specified teaching behaviors with little or no 
regard for the appropriateness of those behaviors. For example, it may be appropriate 
tor a teacher to delay initiation of a lesson, but this instrument does not allow the 
observer - or the teacher ~ to make such a judgment. 

The state of Georgia also uses two classroom observation instruments, which are 
collectively known as the Teacher Performance Assessment Instruments (TPAI), in its 
assessment of beginning teachers. These instruments require an observer to give a 
rating, using a five-point scale, lo each relevant behavior (or indicator) observed. To 
guide the observer m giving the rating, each indicator is defined by a range of teaching 
behaviors referred to as descriptors. In the case of some indicators, the descriptors 
constitute the rating scale; in other instances, the number of descriptors observed is the 
basis for sconng a teacher's performance. For example, for the indicator, "uses proce- 
dures which get learners initially involved in lessons,'* four descriptors are provided. A 
f^S^ °f "1" wo"^'^ be given if "none of the descriptors is evident," and a rating of "5" 
It four of the descriptors are evident" Between the two instruments, a total of 30 
indicators are rated during each 60-minute observation. 



3.2 



34 



CHART 3.1 



DEFINING ATTRIBUTES FOR TWO INDICATORS: 
CONNECTICUT COMPETENCY INSTRUMENT 



IC. THE TEACHER ENGAGES THE 
STUDENTS IN THE ACTIVmES 
OF THE LESSON. 



nD. THE TEACHER USES 

APPROPRIATE QUESTIONING 
TECHNIQUES. 



Defining Attributes 



(1) Student engMgemcat: 

The beginning teacher engages a clear majority 
(at least 80 percent) of the students in the 
instructiroil activities of the lesson. 
Engagement is defined as students* involvement 
in lesson activities consistent with the teacher's 
expectations or directions. 

(2) Re-engagemmt 

When any students are persistently ofi^-task, 
the teacher attempts to bring them back on 
task. 

(1) AppropriMteoess to lessaa content 
Questions must be related to the content 
of the lesson and appropriate to the lesson 
objectives. 

(2) Respcoding to students: 

Teachers should respond to student answers 
or failures to answer. When appropriate, 
teachers should also build upon student 
answers to work toward the lesson objectives. 

(3) Opportunities for student involvement 
Opportunities for student involvement are 
provided by a[9ropriate wait time and by 
addressing questions to a variety of students, 
encouraging most students to be involved. 
Teachers should distribute response 
opportunities to all students. Wait time 
should be suited to the type of question 
askfiil 

(4) Cognitive level: Level of questioning 
Level of questioning must be appropriate to 
the teacher's objectives. If the teacher 

is seeking recall of basic facts or concepts, 
then lower-order cognitive questions may be 
appropriate. If the teacher's purpose is to 
stimulate higher-order thinking, problem-solving 
or generalizing, then higher-order cognitive 
questions should be asked. In many lessons, a 
variety of questioning levels will be appropriate. 



In addition to using a five-poini scale rather than a pass/fail scale, the TPAI is 
further distinguished from the CCI in that it is very prescriptive instructional methodolo- 
gy. For instance, for the mdicator regarding the initiation of lessons (i.e., creating a 
structure for learnmg), the Ca relies on the observer's professional judgment regarding 
the appropriateness of the initiation in relation to the lesson objective(s). In contrast, 
tor a similar indicator, the Georgia instruments specify four techniques to stimulate the 
interest of students, and then establish a rating procedure based on how many of the 
techruques are used; the more techniques observed, the higher the rating. This prescrip- 
tive defimtion of instruction does not allow for as wide a variety of teaching styles as 
does the CCI system. ^ 

The ca, which has undergone extensive development and revision since 1985-86, 
IS currently part of Connecticut's induction program for beginning teachers, the Be ''in- 
ning Educator Support and Training (BEST) Program. Connecticut is using the CCI to 
assess eligibility for provisional certification starting in 1989-90. 

Although Connecticut requires that a beginning teacher be observed on six occa- 
sions by SIX different assessors, for California's pilot test the CCI was used for a single 
observation of each teacher. A single observation would not yield a sufficient sample of 
teaching evidence to make credentialing judgments. However, the focus of the Califor- 
nia pilot test was to evaluate the instrument in more varied contexts than are available 
in the state of Connecticut; for this purpose, a single observation was judged acceptable. 
Also, a single observation per teacher was deemed sufBcient for the purpose of trying 
out a high-inference classroom observation instrument, since much is already known 
about more behavioristic approaches. 

The administration of the CCI in this pilot test, the content of the instrument, 
and the assessnient format are discussed below. The content and format sections of the 
report contain information from the teacher and assessor evaluation forms, as well as 
mtormation and analysis of scoring results. Following these three sections are sections 
on cost analysis and technical quality. The chapter concludes with an overall summary 
together \nXh recommendations for further steps in exploring the feasibiUty and utility of 
high-inference classroom observation instruments such as the CCI in California teacher 
assessment. 



Adrnmistratkm of Assessment 



Begmnmg with an overview of the administration of the CCI, this section contains 
intormation on the following: logistics (e.g., identifying the teacher sample, scheduling 
classroom observations, etc.), security, assessors and their training, scoring, and percep- 
tions of the instrument by teachers, assessors and FWL and RMC staff members. 



Overview 

The administration of an observation system, such as the CCI, in a new teacher's 
classroom requires careful planning on the part of the state, the observer, the new 
teacher, and the school administrator. Despite a very tight timeline, the use of trained 
assessors from the state of Connecticut made it feasible for FWL to complete the pilot 

3.4 



ERIC 



36 



test of the CO. diinng two weeks in May, 19S9. As shown in Tables 3.1 and 3.2, the 
pilot testing was done in six California New Teacher Project locations, by six different 
trained assesscOTfwho were at times accompanied by untrained independent observers 
from FwL and KMC), at different grade levels, and in several subject areas. Forty-one 
teachers participated in this pilot test 



Logistics 

Administration of the CCI required the following logistical activities: identifying 
teacher saiMles, scheduling the observations, arranging for facilities in which to conduct 
the pre- and post-observation meeiines, making travel arrangements for the Connecticut 
assessors (hereafter referred to as Cf assessors), sending the orientation and CCI mate- 
rials to the teachers, and acquiring evaluation feedback from the teachers and assessors. 

J- -J TahlQ 3.1 indicates, the teacher sample for this assessment was almost equally 
divided between Southern and Northern California. Although we strove to ensure a 
roughly equal mix of elementary and secondary teachers, and a variety of teaching 
contexts, the highest priority became securing groups in compact geographic areas so as 
to reduce assessor travel time. In the Chico Project, for example, some of the rural 
schools participatmg in the Project are two to three hours apart 

u A t^^f^ participants, teachers and principals were contacted to 
'xhedule the 45-60 minute observation and to arrange for a 15-20 mini .v. pre-observa- 
tion meeting and a 5-10 minute post-observation meeting at the school site. In accord- 
ance with the request of the experienced CT assessors, no more than two observations 
were scheduled for any one day, and two observations in one day were never scheduled 
two days m a row. (FWL and RMC staff who functioned as untrained observers for this 
assessment quickly discovered the necessity of this scheduling arrangement because 
scripting two lessons and completing two analytical write-ups in a single day was physi- 
cally and mentally exhausting) Although the CT assessors varied in their range of 
experience with regard to grade level and subject matter, it was not possible due to the 
small number of assessors to arrange for assessor-leacher matches along these dimen- 
sions. 

Shortly before the observations, the participating teachers received orientation 
materials and a full copy of the CQ (including copies of the pre- and post-interview 
questions). Soon after the observations, teachers received an evaluation form to fill out 
and return to FWL. (Unfortunately, due to a clerical error, many of the teachers re- 
ceived the forms a month after the assessment As a result, the return rate for the 
forms was low. Only 17 of the 41 teachers completed the forms.) Evaluation forms 
were also given to each of the CT assessors who returned them to FWL along with their 
assessment records. 



Securi^ 

Because each teacher received a ftill copy of the CCI, the main focus of the 
security effort in this pilot test was on the completed documentation for each teacher. 
Assessors mailed the documentation materials to FWL, where thev were securely filed. 



3.5 



37 



TABLE 3.1 



PILOT TEST PARTICIPANTS: 
CONNECTICUT COMPETENCY INSTRUMENT (CCI) 

CTotal Number of Teaciiers=41) 



Dates 


Project 


Assessor 


Number of 
Teachers 


Teacher 
Chaxactecistlcs 


May 1-5 


Long Beach 


Bilingual Chapter I 
Teacher 


7 


3 Elementary; 

4 Junior High 

1 Male; 6 Female 


May 1-5 


Santa Barbara/ 
Ventura 


Trainer of Trainers 


7 


7 Elementary 
1 Male; 6 Female 


May 8-12 


Chico 


Assistant Principal 


7 


5 Elementary; 
2 High School 
2 Male; 5 Female 


May 8-12 


Lodi 


Instructional 
Consultant 


7 


7 Elementary 
0 Male; 7 Female 


May 8-12 


Riverside/San 
Bernardino 


Department of 
Education Staff 


6 


4 High School; 
2 Middle School 
1 Male; 5 Female 


May 8-12 


Winters 


Higher Education 
Representative 


7 


2 Elementary; 

3 High School; 
2 Middle School 

4 Male; 3 Female 



> 

3.6 



38 



TABLE 3.2 

CONNECTICUT COMPETENCY INSTRUMENT: 
SUBJECTS BY GRADE LEVELS OBSERVED 



SUBIECT 


GRADES 




K-3 








TOTAL 


Reading 


2 


4 


1 


1 


8 


Language Arts/English/Spe;lling 


5 


3* 


2 


2 


12 


Science 


1 


4 


1 




£ 
o 


Social Studies 


1 




1 


1 


3 


Mathematics 


4« 


1* 


2 


2 


9 


English as a Second Language 








1 


1 


Music 


1* 








1 


Health and Physical Education 


1 


1 






2 


Other Subjects 








2 


2 


♦Some observed lessons included 
multiple subja;t3. 


15 


13 


7 


9 


44 



ERIC 



3^ 



If an obssr/ation system like the CCI is selected as a method of aaacssriicut for cicdeii- 
tialing teachers in California, procedures to ensure securit\; at the observation and 
processing stages (and during longer-term storage) would have to be developed and 
implemented by California. Each piece of documentation would have to contain identi- 
fying information (e.g., teacher code, observer code, date of observation) in case the 
pieces became separated. All documentation for a given teacher credential candidate 
would also have to lie retained for a minimum number of years, enough to cover the 
period in which teachers could appeal decisions, or to meet statutory requirements. 



Assessors and Their Training 

Sly trained assessors from Connecticut were invited to participate in this pilot 
test. The use of trained assessors from Connecticut rather than California assessors 
reduced the costs of the pilot test considerably. Had we recruited California assessors, it 
would have been necessary to train them. The use of trained Connecticut assessors also 
reduced the amount of staff time required to coondinate the pilot test (e.g., no recruit- 
ment or training was necessary), and enabled us to complete the pilot test in a relatively 
short period of time (two weeks). 

^ In addition to already being trained, the Connecticut assessors had previous 
experience conducting the CCI assessment in Connecticut. This experience ranged from 
two assessors who had conducted three assessments of beginning teachers to one asses- 
sor who had conducted approximately twelve assessments. 

Because the CCI design is based on the philosophy that the professional judg- 
ment of trained assessors is critical in making decisions about teacher competence, the 
CCI training process for assessors is an important component of the CCI system. While 
acknowledging that classroom teaching experience is a valuable basis for professional 
judgment, the creators of the CCI also realized that experienced educators have their 
own ideas and methods for determining effective teaching. The goal of the CCI system 
is to complement experience with training so as to ensure that the assessment criteria 
are consistently and objectively applied in rating teacher performances. 

The training process for CCI assessors consists of five intensive days of instructior, 
and practice, an independent field assignment, two days of follow-up instruction, and a 
proficiency test Dunng the five days of training, assessor candidates meet in groups of 
ten and work with two trainers to learn the following: the content and meaning ot the 
ten indicators, the CCI standards, the procedures for conducting an observation, the 
skills necessary to document (or script) relevant information during the observation, and 
the skills necessary to write, weigh, and rate evidence from the scripted documentation. 
The training is conducted via whole- and small-group discussions and activities, with a 
focus on extensive daily practice in scripting and analyzing videotaped lesson segments 
representing one or more of the indicators. 

Following the five days of training, the assessor candidates are given an inde- 
pendent field assignment which requires them to select and observe a teacher (someone 
not in Connecticut's beginning teacher program) and then to write and analyze evidence 
based on the ooservation. The results of this assignment are shared and discussed 
during the two days of follow-up training. Then the assessor candidates are given a 



proficiency test in \yhich they analyze two videotapes of classes taught by beginning 
teachers. 

A staff member from FWL and one from RMC participated in the five-day train- 
ing sessions in the summer. Both had experience with the CCI when they functioned as 
independent observers in the spring pilot testing, so neither entered the training as a 
complete novice. Nevertheless, both found the intensive training to be stimulating and 
valuable. The daily discussions among f)articipants, which centered on the content and 
meaning of each of the ten indicators of good teaching, as well as on hc^^ -o write, 
weigh, and rate evidence for the indicators, were invaluable as a means of helping par- 
ticipants fully understand the meaning of the indicators and the rating standards. The 
continual professional interchanges also provided participants with fresh and stimulating 
insights into teaching in generr^I and into their own teaching in particular. Both during 
and after the training, most participants claimed that, because of the training, they would 
be much better teachei-s when they returned to the classroom. Participants also ex- 
piessed a renewed rrofsssional commitment to teaching and strong enthusiasm for 
participating in the process of inducting new teachers into the profession. 

/JJthough both FWL and RMC staff members found the training to be valuable, 
they also felt the training could be improved in at least two ways. These two factors 
could be instrumental in the event that California decides to develop a comparabl- 
systen for its teacher certification process. First, more specific examples of written 
evidence for each indicator could be provided and utilized as part of the training. This 
would eliminate a lot u time spent by the trainees writing evidence inappropriately. 
(Written examples are available in the trainee's handbook, but these were seldom re- 
ferred to by the trainers.) Second, the amount of scripting from videotapes could be 
reduced so that more time could be given to writing, weighing, and/or rating evidence. 
Although the daily scripting from videotapes shown on a medium-sized television screen 
p.-ovided benefL-ial practice in scripting, it was also an artificial situation. Scripting from 
a TV screen is not the same as scripting in a classroom. It is harder to "observe" the 
whole picture (i.e., the classroom and its participants) when the picture is limited to an 
area the size of the television screen. In addition, focusing on and scripting from a rela- 
tively small TV screen (compared to the size of the classroom) is very hard on the eyes. 
Instead of scripting as much, trainees could be given typewritten scripts for practice m 
writing, weighing, snd/or rating evidence. In the latter t^o areas especially, some train- 
ees experienced confusion even at the end of the training. Since the evidence compo- 
nent is one of tne key features of the CCI that distinguishes it fi om other classroom 
observations systems, it is important that there is consistency among assessors. 

Both staff members also considered the issue of shortening the training. The 
RMC staff member believes the training in Connecticut could be shortened if it is better 
organ: -ed (e.B., more specific examples are provided before exercises, more materials to 
read anv' study before the training). The FWL staff member agrees that the training 
would be improved with better organization, but is not sure the training should be short- 
ened. At described above, in five days the assessor candidates are introduced to and 
expected to learn not only the content of th2 assessment and how to conduct the as- 
sessment, but also how to score the assessment. The participants are trained in how to 
be competent assessors and competent scorers at the same time. A total of five days 
training, which is approximately 2 1 '2 days each for assessor training and scorer training 
covering 10 conceptually distinct inaicators, does not seem to be an excessive amount of 



3.9 



ERIC 



41 



time. Moreover, because the CCI indicators must be apnlied to a broad range of teach- 
ing contexts, teaching styles, and pedagogical techniques/cxtensive discussion of applica- 
tions ot the indicators are necessary in order to ensure ;;iat observers are able to 
implement the Cu (or any high inference observation instrument) fairly. Finally, any 
training of CaMomia assessors in the use of a high inference observation instrument 
such as the CCI would also have to address the complexities of California classrooms 
(e.g., the diversity of students, large class sizes, the use of instructional aides). The 
length and structure of assessor training should be based on a careful evaluation of what 
skills are need by assessors to achieve a high degree of quality, consistency and reliability 
m the assessments. i j j j 

In Connecticut, three types of assessors are used to administer the CCI: (1) state 
assessors, (2) administrator assessors, and (3) teacher assessors. Each beginning teacher 
IS observed by two of each type for a total of six observations. The teachers participat- 
ing m this pilot test were asked for their suggestions as to who should administer a class- 
room observation assessment (district administrators and assessors outside the district 
were given as examples of possible answers). The teachers' answers were as follows: 

Assessors outside the district 9 

Other teacher(s) 1 

Site administrator 1 

Other 3 

No answer 3 

Of the nine teachers opting for assessors outside the district, almost all did so 
because they believe such persons to be "less threatening" or "less intimidating," or 
because jhey would be "fair and unbiased," and there would be less chance of "plaving 
favorite. The "Other category included teachers who suggested that the instrument be 
administered by people who were well-trained in using it. 



Scoring 

The scoring system of the CCI is an integral part of the CCI process. That is, the 
same person who conducts the classroom observation uses the documentation from the 
observation to score the observation. The scoring system begins with a documentation 
torm for each observation. The form requires the assessor to provide, from the scripted 
lesson, summan. of both positive and negative evidence for each of the instrument^ 
ten indicators ana their corresponding attributes. Each indicator has a page ("t-sheet") 
tor recording the evidence. At the bottom of the page, the assessor is asked to consider 
the evidence m order to rate the teacher as "Acceptable" or "Unacceptable" on t ■ 
indicator. (Some indicators also allow other ratings such as "Cannot Rate" or "Not 
Applicable. ) 

For this pilot test, two documentation forms were used: one an early version that 
was used m Connecticut pilot tests, and the other a revised version that had never been 
"sed before. The revised version differs from the older version in that it first asks the 
assessor to consider the evidence for each attribute and to give a rating of "Acceptable" 
Of Unacceptable" to that evidence. These attribute ratings are then combined, follow- 
ing rules established for each indicator, to obtain a rating of "Acceptable" or "Unac- 
ceptable for the indicator. For example, the indicator, "Questioning Techniques," can 

3.10 



42 



only be rated "Acceptable" if a!! four of its attributes are also rated "Acceptable." The 
older version docs not require a rating of the attributes, but only a consideration of the 
overall evidence corresponding to the attributes. According to Connecticut, the revised 
version makes it easier to rate the indicators because the decision rules are more clearly 
defined 

Upon completion of the evidence summaries and the individual indicator ratings, 
the assessor fills out a "Summary of Ratings" form which lists all the indicators (and on 
the revised version, the attributes) and shows the assessor's rating for each one. By 
looking at this "Summary of Ratings" form, one can determine how many "Acceptable" 
and "Unacceptable" ratings the teacher obtained For certification purposes, the State 
of Connecticut requires teachers to obtain an "Acceptable" rating on at least seven of 
the ten indicators. (Certification in Connecticut^ however, is notbased on a single 
observation. A new teacher in Connecticut is observed six times: twice early in the 
year, twice in the middle, and twice near the end of the year. The two observations per 
time period are conducted by different observers, whose ratings are sent to an inde- 
pendent testing service. The testing service aggregates the ratings to obtain a single 
rating for each indicator. This aggregated set of ratings is sent to the teacher, who is 
urged, but not required, to share the results with the assigned mentor teacher.) 

The CCI scoring system is very labor intensive. The entire process takes from 
two and one-half to four hours per assessment because the assessor/scorer must write up 
at least ten pages of evidence ([one for each indicator) and then carefully analyze the 
evidence in order to give a rating to each of the attributes and the indicators. 



Teacher, Assessor, FWL, and RMC Staff Perceptions of Administration 

As reported in the Spring 1989 Administration Report, the majority of teachers 
were satisfied v^th the administration of this assessment. Half of the assessors were also 
satisfied, while half did not like the number of observations scheduled for the week. 
Said one assessor: 

Seven assessments in one week is unrealistic. The quality of 
an assessor's write-up is directly affected as the number of 
observations increases beyond three a week However, if an 
assessor is observing as their only occupation, one a day is 
feasible. 

FWL and RMC staff who served as independent observers for this assessment concurred 
with the above observation, and also noted that if classroom observations were selected 
as a method of assessment for credentialing teachers in California, assessor fatigue 
resulting from traveling between school sites, especially in rural areas, should also be 
considered. 

The assessors reported two difficulties in administering this assessment: (1) the 
amount of time it takes to write evidence and rate an observation, and (2) not being 
able to give teachers some feedback after the observation. Several of the teachers also 
expressed a strong desire for feedback. 



3.11 




43 



FWL and RMC staff also found the amount of time io write evidence (i.e., up to 
tour hours) to be a difficult part of the administration of this assessment. In addition, as 
[Ti^^^'^"^^ earher, the logistics of scheduling the teachers in rural areas presented some 
difficulty. Many rural schools are so far apart that even scheduling one observation a 
day required careful calculation. 



Assessment Content 



• u T^^ content of the CQ focuses on teaching behaviors that are directly observable 
m the classroom: management of the classroom environment, instruction, and assess- 
ment of student progress. The content is firmly grounded in the research literature on 
effective teaching, and it incorporates the experience and ideas of Connecticut teachers, 
district administrators and teacher educators. 

The development of the CCI content stemmed from a 1984 validation of the 
Connecticut Teaching Ccnipetencies (Streifer 1984). In 1985-86, a grant from the 
National Institute of Education to evaluate the feasibuity of establishing an induction 
program for Connecticut's beginning teachers led to a first draft of an assessment in- 
strument This instrument was greatly modified after a 1987 conference in which na- 
tional experts met with Connecticut State Department of Education (CSDE) staff to 
discuss philosophy and approaches to performance assessment, instrument development 
and standard setting. As a result of this conference, a small working group of practi- 
tioners, CSDE staff and researchers created a first draft of the CCI. 

A second panel of national experts in teacher assessment, observation methodol- 
ogy, research design, and implemc .itation of state assessment programs critiqued the 
draft CCI m 1987. The instrument was given a small-scale pilot test in December of 
that year. At about the same time, the draft was critiqued by Connecticut representa- 
tives of higher education, professional organizations, local district staff and state depart- 
ment personnel. More revisions were made to the CCI, another small-scale pilot test 
was conducted in 1988, and, after more revisions, a full pilot test was conducted with 
220 beginning Connecticut teachers in 1988-89. 

Also in 1988, over 1,500 Connecticut educators participated in a content validity 
study in which they rated the appropriateness of the CCI's indicators to the job of teach- 
ing m Connecticut. As part of this validity study, the generalizability of the instru- 
ment was also evaluated, and a bias review was completed. 

The content of the CCI was still being revised in 1989. As mentioned eariier, two 
versions of the CCI were used for this pilot test: (1) a version that does not include 
ratings by attributes and that had been used in previous Connecticut assessments, and 
(2) a recently revised version that does include ratings by attributes and that had never 
been used Because only two of the CT assessors used the older version, this section 
tocuses on the more recent version. Although the two forms differ in some of the at- 
tributes which define each indicator and in the criieria for rating, both forms focus on 
the same ten indicators named above. 



3.12 

44 



In the following pages, the content of the CCI is evaluated on the basis of seven 
factors: 

0 Q)ngruence with California curriculum guides and frameworks; 
0 Extent of coverage of California Standards for Beginning Teachers; 
0 Job-relatedness of the instrument; 
0 Appropriateness for beginning teachers; 

0 Appropriateness across different teaching contexts (e.g., grade levels, 
subject areas); 

0 Fairness across groups of teachers (e.g., ethnic groups, gender); and 
0 Appropriateness as a method of assessment. 

We would like to note that, just as Connecticut educators reviewed the CCI for 
job relevance and importance, if the CCI is to be further field tested in California, a 
validity study should be done at the same time, (For more on validity, see the section. 
Technical Quality.) Without such a study, and with our pilot test sample of 41 teachers, 
our ability to comment on the CCFs appropriateness along such dimensions as job-relat- 
edness, appropriateness for beginning teachers, and appropriateness across contexts is 
limited. Thus, except for the first two dimensions of curriculum congruence and stand- 
ards coverage, the discussions of the remaining dimensions are based on the perspective 
of the participating teachers, the CT assessors, and FWL and RMC staff, as reflected in 
feedback forms, in informal conversations with the assessors, in meetings with and a 
report from RMC staff, and in data from the CCI ratings sheets. 



Congruence with California Curriculum Guides and Frameworks 

The California curriculum guides and frameworks are, by definition, subject spe- 
cific, which the CCI is not (the CCI focuses on generic teaching behaviors which can be 
applied across subjects). Nevertheless, FWL staff looked at the CCI to see if there is 
congruence with the guides and frameworks, and how the CCI could be modified to 
improve congi uence. For our analysis, we examined the following four California guides 
and frameworks: English-Language Arts Guides Mathematics Framework, Science 
Guide, and the History-Social Sciences Framework. Because the guides and frameworks 
were developed independently by subject-matter panels, they vary markedly in their foci 
and degree of specificity. We did not look at the curriculum guides in other areas, but 
we would expect similar results to those discussed below. 

Table 3.3 briefly describes the content of each of the four guides and frameworks, 
and also lists the CCI indicators which address the content As the table indicates, there 
is only partial congruence between the CCI and the guides and frameworks. It should 
be noted, however, that the CCI is a generic, non-curriculum specific, high - inference 
observation system. As such, it does not measure a teacher*s knowledge of curriculum 
directly. On the other hand, the CCI includes several indicators to assess a teacher's 
presentation of the content of a lesson: "Structure for Learning,** "Lesson Content,'* 

3.13 



45 



TABLE 3.3 

CONGRUENCE OF CCI WITH CALIFORNIA CURRICULUM 
GUIDES AND FRAMEWORKS 



CutxiciilttzaOukb 
orFnuxifiWork 


Cooteol 
Description 


ca 

Indicators 


Comments 


1. English-Language 
Arts Guide 


22 guides for instruction 
in grades K-8. 


None 


None 


2. Mathematics 
Framework 


S major emphases of 
curricular content. 


None 


None 




10 characteristics of 
instruction. 


Lesson Content, 
Questioning Techniques, 
Monitoring, and 
Adjusting 


The three indicators are 
congruent with the three 
characteristics: 
Mathematical Language, 
Questioning and 
Responding, and 
Corrective Instruction. 


3. Science Guide 


Content-knowledge 
des^ripiiOES of biology, 
earth, and physical 
science programs for 
grades K-8 (includes 
ideas on how to teach 
the subject matter). 


None 






General characteristics 
of a strong science 
program. 


Lesson Content, 
Questioning Techniques 


iidicators corresponding 
to characteristics focus on 
development of students* 
emotional, physical, and 
intellectual development 
and questioning techniques 
and responses. 


4. History-Social 
Sciences 
Framework 


Three curricular goals, 
their corresponding 
learning strands, and 
a sequential curriculum 
for grades K~12. 


None 


None 



3.14 




"Lesson Development,'' "Questioning Techniques," and "Communicating Clearly." 
Assessment of the content through these indicators could be strengthened if the observa- 
tion were conducted by an observer with special expertise in the subject matter of the 
lesson. To obtain more extensive evidence of a teacher's knowledge and ability to teach 
specific curriculum content, other measures would be needed. These measures could be 
alternatives to observations, such as interviews, written assessments or combinations of 
these with an observation system. 



Extent of Coverage of Ca^^fomia Standards for Begmning Teachers 

As mentioned eariier, the CCI was specifically developed with the Connecticut 
Teaching Competencies in mind. It was not developed to assess the standards for 
beginning teachers that have been established by California's Commission on Teacher 
Credentialing. Although there are similarities between the Connecticut competencies 
and the California Standards, they are not identical. FWL staff examined the CCI indi- 
cators to see how well they assess Standards 22 through 32 of the California Beginning 
Teacher Standards, which define levels of pedagogical competence and performance that 
California teacher credential candidates are expected to attain. These standards are 
reprinted below (in italics), along with an analysis of how the CCI indicators correspond 
to each standard. 

Standard 22: Student Rapport and CJassioom EnviromnenL Each candidate 
QStablishQS and sustains a /eve/ of student rapport and a classroom Qnvironmtnt that 
promotes kaming and tquity, and that fosters mutual respect among tht persons in a 
class. The CCI indicator, "Positive Learning Environment," requires the observer to 
look for evidence of student rapport and of a classroom environment that is conducive 
to learning. The indicator, "Behavior Standards," requires evidence that the teacher has 
established standards of student behavior and applies fitting consequences to both 
appropriate and inappropriate behaviors. 

Standard 23: Curricular and Instructional Planning SkSk. Each candidate prQ- 
parQS at Itast ont unit plan and stvtral ksson plans that include goals, objtctivQs, 
strattgiQs, activities, materials and assessment plans that are well defined and coordinat- 
ed with each other. The CCI process requires the teacher to plan a 45-60 minute lesson 
for observation and to specify on a pre-observation form the objectives, activities, in- 
structional arrangements, and matenals that are part of the lesson. In addition, the 
indicator, "Lesson Development," requires the observer to look for evidence that the 
teacher has developed the lesson in a logical or sensible order, and thai the materials 
and instructional aiTangements used for the lesson are consistent with the planned or 
emerging lesson. 

Standard 24: Diverse and Appropriate Teaching. Each candidate prepares and 
uses instructional strategies, activities and materials that are appropriate for students 
with diverse needs, interests and learning styles. The CCI indicator, "Lesson Content," 
requires the observer to seek evidence that the teacher's choice of content (de- 
fined by the instrument as "student learning activities, lesson materials, teacher presenta- 
tion, and teacher questioning" as manifested in the lesson) is appropriate to the stu- 
dents' level of development. This indicator does not directly address, however, the issue 
of students mth diverse learning styles and interests, although evidence for this standard 



3.15 

47 



may be found during the observation. It also does not assess whether the teacher's 
strategies, techniques, and materials are "free from bias." 

Standard 25; Student Motivation, bimlvement, and QmducL Each candidate 
motivates and sustains student interest, involvement and appropriate conduct equitably 
during a variety of class activities. Several CCI indicators address this standard: The 
indicator, "Positive Learning Environment," requires the observer to look for evidence 
that the teacher "creates a climate that encourages all students to achieve"; the indica- 
tor, "Appropriate Standards of Behavior," asks the observer to look for evidence that 
the teacher "communicates and reinforces appropriate standards of behavior for the 
students"; the indicator, "Student Engagement," requires the observer to look for evi- 
dence that the teacher involves "a clear majority (at least 80%) of the students in the 
instructional activities of the lesson"; and the indicator, "Appropriate Questioning 
Techniques," asks the observer to seek evidence that the teacher, through questioning 
techniques, provides opportunities for most students (including those of different ethnic 
groups and genders) to be involved in the lesson. 

Standard 26: Presentation SlaJb. Each candidate communicates effectively by 
presentmg ideas and instructions clearly and meaningfully to students. The CCI indica- 
tor. Communication Skills," requires the observer to seek evidence that the teacher 
communicates in a "coherent manner, avoiding vagueness and ambiguity that interfere 
vnth student understanding." This indicator also assesses the teacher's technical quality 
ot communication, focusing on articulation, volume, and rate of delivery. This indicator 
does not address, however, the teacher's written language, and so the indicator would 
have to be changed to include a focus on the teacher's written language in order to 
meet the standard. This standard is also addressed by the CCI indicator, "Appropriate 
Lesson Content," which asks the observer to ascertain if the teacher uses "vocabulary 
and language appropriate to the learners." 

V* ^^^^^ 27: Student Diagnosis, Achievement and Evaluation. Each candidate 
Identities students prior attainments, achieves significant instructional objectives, and 
evaluates die achievements of the students in a class. The CCI indicator, "Positive 
Learning Environment," asks the observer to supply evidence that the teacher "creates a 
climate that encourages all students to achieve" (i.e., communicates expectations for 
achievement). The indicator, "Monitoring and Adjusting," asks the observer to look for 
evidence that the teacher "checks the level of student understanding at appropriate 
points during the lesson," and, when monitoring indicates that students are misunder- 
standing or faihng to learn, or that students have mastered the concepts being taught, 
that the teacher uses "appropriate strategies to adjust his or her teaching." The CCI 
does not assess the methods a teacher uses to ascertain students' prior attainments 
related to the subject of the lesson or the methods used to formally evaluate student 
work. 

Standard 28: Cognitive Outcomes of Teaching. Each candidate improves the 
ability of students in a class to evaluate information, think analytically and reach sound 
conclusions. The CQ indicator, "Structure for Learning," requires the observer to find 
evidence that the teacher's lesson includes closure(s) which could help the students to 
evaluate information, think analytically, and reach sound conclusions. It does not evalu- 
ate a teacher's ability to design instruction that increases the critical thinking skills and 
problem-solving ability of students unless that is the objective of the lesson observed; if 



3.16 



48 



that is the lesson's objective, then the indicator, "Questioning Skills," requires the ob- 
server to mid evidence that the teacher asks high-order cognitive questions. 

Standard 29: ABsctivc Outcomes of Teacbing. Each candidate fosters positive 
student attitudes toward the subjects learned, the students themselves, and their capacity 
to become independent learners. The CCI indicator, "Positive Learning Environment," 
requires the observer to find evidence that the teacher demonstrates "patience, accept- 
ance, empathy and interest in students through positive verbal and non-verbal 
exchanges-," ' avoids sarcasm, disparaging remarks, sexist or racial comments, scapegoat- 
mg or physical abuses;" "exhibits her or his own enthusiasm for the content and for 
learning; and "maintains a positive social and emotional tone in the learning environ- 
ment" It does not assess, however, whether a teacher encourages positive interaction 
among students or independent learning experiences. 

Standard 3ft Capacity to Teacb Qoss<uIturaIfy. Each candidate demonstrates 
compatibility with, and ability to teach, students who are different from the candidate. 
The differences between students and the candidate should include vthnic, cultural, 
gender, linguistic and socio-economic differences. Although no CCI indicator addresses 
this standard directly, the CCI process and several indicators (e.g., "Positive Learning 
Environment," "Lesson Content," and "Questioning Techniques'^) allow the observer to 
note whether the teacher demonstrates rapport with, and the ability to teach, students 
who are different from the teacher. If, however, the classroom is homogeneous with 
respect to ethnicity or culture or socioeconomic differences (e.g., several classrooms in 
Northern California appeared to be ethnically homogeneous), then the CCI cannot even 
mdirectly assess this ability. 

Standard 31: Readiness for Diverse Responsibilities. Each candidate teaches 
students of diverse ages and abilities, and assumes the responsibilities of full-time teach- 
ers. This standard focuses on a teacher's ability to teach classes which span the range 
covered by the credential (i.e., grades K-8 or 7-12) or students at two or more ability 
levels (such as remedial and college preparatory classes). None of the CCI indicators 
are designed to assess this ability. (It would be possible, however, to compare the 
observations of a teacher who teaches both remedial and college preparatory classes.) 
This standard also addresses a teacher's ability to fulfill typical responsibilities of teach- 
ers such as meeting school deadlines and keeping student records, none of which are as- 
sessed by any CCI indicator. 

Standard 32: Professional ObH^tkms. Each candidate adheres to high standards 
of professional conduct, cooperates effectively with other adults in the school community, 
and develops professionally through self-assessment and coUegial interactions with other 
niembers of the profession. None of the CCI indicators assess whether a teacher fulfills 
his/her obligations as a member of a profession and a school community (e.g., adheres 
to high standards of professional conduct and engages in coUegial relationships). 

The extent of coverage by the CCI of the California Beginning Teacher Standards 
IS summarized in Table 3.4. The table lists the CCI indicators u\at address each stand- 
ard and also describes the extent of coverage provided. 



3.17 

49 



TABLE 3.4 



EXTENT OF COVERAGE BY THE CCI OF 
CAUFORNU STANDARDS FOR BEGINNING TEACHERS 



otaiHiara 


CCIIhdicatt>r($) 
Assessing Standard 


Extent of 
Coverage 


22: Student Rapport and Classroom 
Environment 


-Positive Learning Environment 
-Behavior Standards 


Full 


xj. v,umcinar ana instructional 
Planning Skills 


-Lesson Development 


Partial 


z^. ui verse ana Appropnate Teachmg 


-Lesson Content 


Partial 


25: Student Motivation, Involvement, 
and f^DnHiipf 


-Positive Learning Environment 
-Behavior Standards 
-Student Engagement 
-Questioning Techniques 


Full 


26: Presentation Skills 


-Lesson Content 
-Communication Skills 


Partial 


^ 1 . oiuucni uiagnosis, Acnievement^ 
and Evaluation 


-Positive Learning Environment 
-Monitoring and Adjusting 


Partial 


/o. cognitive Outcomes of Teaching 


-Structure for Learning 
-Questioning Techniques 


Partial 


29: Affective Outcomes of Teaching 


-Positive Learning Environment 


Partial 


30: Capacity to Teach Crossculturally 


-None 


None 


31: Readiness fcr Diverse 
Responsibilities 


-None 


None 


32: Professional Obligations 


-None 


None 



3.18 

50 



Sixteen of the 18 teachers who evaluated the CCI stated that all the major 
competencies measured by this assessment are relevant to their job of teaching. Some 
of the teachers described the job-relatedness of the CCI as "excellent/* Other teacher 
comments were as follows: 

All areas are relevant to my job of teaching. 

I believe that the instrument covered all areas. 

Yes! I felt that my entire mode of teaching was being evalu- 
ated, not just my lesson. 

Because the CCI assessment entails observing teachers actually teaching in their 
own classrooms, the job-relatedness of this assessment is strong. Job relevance is a 
particularly important factor in evaluating different approaches to teacher competence 
assessment, because professional practitioners and courts of law consider this factor first 
when they judge the fairness of an evaluation system. Furthermore, as a classroom 
observation system, the CCI offers direct evidence of actual teaching competence. For 
this reason, it is not necessary to make inferences about how well a teacher conducts 
instruction if such an assessment is used. Making inferences about the quality of a 
teacher's actual teaching is a primary characteristic of all other approaches to teacher 
assessment, as will be shown m subsequent chapters of this report 



Appropriateness for Begiiming Teachers 

Teachers were asked if they felt they had an opportunity to acquire the knowl- 
edge and abilities measured by the CCI. A slight majority of teachers (10 of 18) re- 
sponded affirmatively. One teacher remarked: 

Experience will certainly help a teacher become more effec- 
tive in all areas but this is ^ood for the beginning teacher to 
begin to focus on the speciGc skills listed in the assessment 
instrument 

Three teachers responded negatively to the appropriateness for beginning ^^achers 
question; five teachers either did not respond or gave answers which did not address the 
question. 

The Cr assessors were also asked if they thought the CCI assessment was appro- 
priate for beginning teachers. Their responses were generally affirmative - but with 
qualifications. Several of the assessors stated that the assumption that beginning teach- 
ers can acquire the knowledge and skills needed to demonstrate competence by the end 
of the first year is a valid one, but only if the teachers have received mentoring, supervi- 
sion and support during the year as they do in Connecticut. Explained one assessor: 

Although the indicators are written with vocabulary and 
terminology that is global, the variety of ways each indicator 



3.19 

51 



can be expressed in terms of specific behaviors is best ad- 
dressed cooperatively so that someone with more extensive 
classroom experience can broaden a beginning teacher's 
experience. 



Some assessors also mentioned that the California teachers had particular difficul- 
ty with the indicators '-Structure for Learning" and "Questioning," and, as a result, 
questioned whether the teachers received sufBcient training in these areas. The asses- 
sors perceived the teachers' difficulty with these indicators as further indication that 
teachers need more assistr.nce or preservice education. 

Our analysis otjhe CCI ratings (scores) revealed that, for the most part, the 
beginning teachers performed well on the CCI. Approximately 80% (33 of the 41 
fn^ MV ^^^^^^^^^ ' acceptable" ratings on at least seven of the ten indicators. Almost 
V. iir ^^^'^hers) received "acceptable" ratings cn all ten indicators. Of the teachers 
wno did not perform as well, approximately 15 percent (six teachers) received four to 
seven unacceptable ratings, and about five percest (two teachers) received as many as 
Scators*"^ unacceptable" ratings. No teacher was rated as "unacceptable" on ali ten 

Our scoring analysis also suggested that of the ten indicators, teachers had the 
most trouble with one indicator in particular. As Chart 3.2 indicates, of the 41 teachers 
who were observed, onlj^ 20 teachers (49%) received an "acceptable" rating on the 
structure tor Learning indicator. This may suggest that many befinning teachers need 
more training or experience in providing initiations and closures to lessons (i.e., "Struc- 
ture tor Learmng). Alternatively, this skill may develop with experience, so the CCI 
standard.s for this teaching ability (indicator) may be inappropriate or too high for 
begmmng teachers (It is the opinion of FWL staff, however, that the first explanation is 
more hkely than the second.) ^ 



Appropriateness across Contexts 

. Almost all of the teachers (15 of 18) felt the CCI approach is useful for teachers 
in ditterent contexts (i.e., across grade levels, subject areas, and diverse student groups). 
borne ot the teachers, however, qualified their answers, saying that its usefulness was 
dependent upon the teacher receiving feedback after the observation; for example, one 
teacher remarked, "It r Id be [useful] if I actually got feedback from it." The following 
pages examine the issue of the CCI's appropriateness for different grades, subjects, and 
student groups. * » j > 

• ^^^^ ^ subject matter. As indicated above, most teachers believe the 
^Cl is appropnate for new teachers in varying grades and subject areas. (See Table 3.2 
for a complete hjiting by grade level of all subjects observed.) This opinion was also 
expressed by almost all of the CT assessors. Commented one assessor: 

Considering that I have teaching experience spanning pre- 
school through nth grade, I am confident that this instru- 
ment IS relevant, to ail grades and subject areas. 



3.20 



52 



CHART 3.2 



CCI SCORING RESULTS 

PERCENT OF TEACHERS RECEIVING •ACCEPTABLE* RATINGS PER INDICATOR 



ELEMENTARY 
SCHOOL 



MIDDLE SCHOOL 
OR JR, HIGH 
(N'd) 



HIGH SCHOOL 
(N'9) 



2 3 



10 



100% 

75% iiii 



50% |lpp 

26% iiii 



ilL 



:i:::iH! 



100% 
75% 
50% 
25% 
0% 

100% 
75% 
50% 
25% 
0% 



:::::::::: 



::.|!|!!! 

lumlll! 

iiiil 

liliiiiu 
iiiniili 



iiililli 



iiili 
ijiiijiii 

iiiil 



Jiiil 



t«L«arnlng Environment 
2«6«h4vlor 3ttnd«rd» 
3*dtud«nt Enoag«m«nt 



4«Aou tln««/Tf4n«ition« 
0«Struotur* for Ltirnino 
0*L#;)«on Coitttnt 



7«L«««on O«v«topm*nt 

0* Ou«ttlonlno 

9«Communlc4tlon 

10'Monl torinQ «nd Adjusting 



3.21 



53 



One sss^or, however, questioned whether the instrument was appropriate for all 
subject areas. This assessor observed a teacher in an agricuhural management class and 
tound that no tcachmg of the sort defined by the Instruction and Assessment clusters 
occurred.' Accordmg to her report, the observed lesson did not contain any explicit 
mstniction from the teacher. Instead, the teacher acted as u stipervisor while the stu- 
dents engaged in a particular activity (in this case, worming sheep). 

The assessor stated that based on her many years of experience as a high school 
admmistrator observing industrial arts and business classes, many -- if not most -- voca- 
tional classes are like this; that is, the approach seems to be, "Just do the task and you'll 
learn. ' Exactly what is learned, however, is not necessarily specified. Although tne ob- 
served lesson took jplace on what the teacher described as an "activity day," the assessor 
asked if there aren t some classes (e.g., vocational education classes) that have less 
emphasis on direct instruction than do other classes. She questioned whether these 
classes should be assessed in the same manner as other classes. In other words, is it 
appropriate to rate the teacher of the agricultural management class on the use of 1 
questioning techni(jues, for example, when the nature of the lesson, as perceived by the 
teacher, required little questioning? Or is the definition of good teaching such that the 
teacher should have been expected to use questioning techniques during a "hands on" 
learning activity? Based on one's answers to these concerns, the CT assessor suggested 
that the CCI may or may not be appropriate across subject areas. 

To begin to address the question raised bv this assessor, we examined the docu- 
mentation tor the teacher who taught the agricultural management lesson. Although the 
assessor checked a "Cannot Rate" for lesson content, there were four other areas in 
which the teacher was given an "Unacceptable" rating: "Structure for Learning", 

Lesson Development", "Questioning Techniques", and "Monitoring and Adjusting." 
Tlie evidence given for the negative ratings seemed to justify the ratings. For example, 
the teacher s initiation and closure ("Structure for Learning") were purely administrative 
(e.g., after the last lamb was vaccinated, the teacher said, "That's it. Go in and get 
washed up. ). Even a very activity-based lesson such as this (i.e., worming sheep) would 

-nefit from a structural framework which facilitates learning. Another example is that 
although the teacher stood in the presence of the students and could see if they were 
performing the procedure correctly, he was not obcerved monitoring whether or not they 
understood the procedure. Such monitoring would alsu fecilitate learning, even in an 
activity-based class. "Thus, although the data are too United to draw a definitive conclu- 
sion on this issue, the CCI does seem capable, at least to some degree, cf assessing 
teachers of a variety of different subjects, including vocational education and other activ- 
ity-basea classes. 

In addition to looking at the particular documentation tor the lesson in question, 
we also looked at the distribution of ratings across subject areas of the teachers who 
received fouror more "Unacceptable" ratings (i.e., failed the assessment by Connecticut 
standards). The teachers who fell into this category included four of the 19 teachers 
who were observed teaching English/language arts (includes reading), two of the six 
teaching science, one of the three teaching social studies, and the one teaching agricul- 
tural management. Moreover, of the four English/language arts lessons, two were taught 
by elementary teachers and two by high school teachers. Although the data are too 
limited to draw any conclusions about the relative degree of dif :ulty across subject 
areas, there was only one instance where an assessor reported difficulty in applying the 
CCI to the observed lesson, and that was for the agricultural management class. 

3.22 



54 



In connection with this question of appropriateness across subjects, the CT asses- 
sors were asked how much knowledge of the subject matter should an assessor have to 
administer this assessment Several of the assessors felt that there are certain content 
areas where a teacher-assessor match is a must In particular, the assessors noted the 
importance of subject-matter knowledge at the secondary level. One assessor stated that 
her knowledge of physical science was crucial to her analysis of a middle school science 
lesson, and her lack of content knowledge was an impediment to her analysis of a high 
school vocational agriculture lesson. This assessor also questioned her ability to judge 
the level of difficulty of a first-grade lesson considering that her background was as 3 
chemistry/physics teacher and high school administrator. 

Two of the assessors, however, strongly dissented. One stated that knowledge of 
the subject matter is not essential to administering this assessment, but that knowledge 
of the instrument is. The other maintained that through focusing on cues such as the 
pattern of tudent engagement (i.e., are most students wearing puzzled expressions and 
withdrawing from active participation in the lesson?) and the content of questions asked 
by students, she could inter whether or not the lesson content and development were 
satisfactory. This particular assessor had experience teaching in both secondary and 
middle schools, so it is possible that she was speaking from a broader range of experi- 
ence than other assessors. 

Although there are merits to both sides of the argument, it would be difficult to 
judge the accuracy of the lesson content (as required by the indicator, "Appropriate 
Lesson Content") or the appropriateness of the questions asked to the lesson content 
(as required by the indicator, "Questioning Techniques") if one is not familiar with that 
content. Further, how could an observer judge the communication skills (as required by 
the communication indicator) of a teacher using a language other than English during 
much of the observation penod if the observer does not speak that language? An 
observer does have the option of giving a "Cannot Rate" rating to the "Lesson Content" 
indicator, but not to the other indicators. Thus, it seems desirable that, whenever possi- 
ble, the observer have some familiarity with the subject area in order to provide higher 
quality observations and more reasonable ratings on the attributes and indicators. In 
addition, if feedback is to be provided to the teacher, it would probably be more useful 
if It is based on the observation(s) made someone familiar with the subject. 

In our analysis of the CCI ratings, we also focused on the appropriateness of the 
CCI assessment across grade levels. We found that on some indicators there was a 
difference in how well teachers at different grade levels (i.e., elementary, middle, and 
senior high school) performed. Chart 2 shows the total number of teachers who re- 
ce. /ed an "Acceptable" rating for each indicator, and also the number and percentage 
of teachers with the rating at the elementary, middle school, and high school levels. As 
the table shows, our sample of high school teachers tended to perform less well than our 
sample of middle school and elementary school teachers on four c the ten indicators. 
The percentage of middle school and elementary school teachers receiving an "Accept- 
able rating was much higher than the percentage of high school teachers for the follow- 
ing indicators: "Lesson Content," "Lesson Development" "Questioning Techniques," 
and "Monitoring and Adjusting." In addition, one third of the high school teachers 
received an over?ll total of from four to seven "unacceptable" ratings compared to only 
one of the eight middle school teachers and two of the 24 elementary teachers. Due to 



3.23 

55 



the small size of each group, no conclusions can be drawn from the data. It may be 
beneficial, however, for any further pilot testing of high-inference observation instru- 
menji to include a focus on grade level comparisons. For example, future pilot tests 
could be conducted to see if a match between teacher and assessor with regard to grade 
level expenence vields the same or different results as this pilot test; Similarly, addition- 
al pilot tests might be conducted to assess whether an assessor-teacher match with 

tevel *° ^" ^^^^^ ""^^"^^ ^^^^ P^'^^'^^^^'^^y high school 

EMveise students. The philosophy of the CCI, as stated in the CCI training 
handbook ^or assessors, mcludes the assumption that "Effective teaching is sensitive to 
cultural diversity The handbook also states that "Competent beginning teachers will 
help prepare children for participation in a culturally diverse world and will also teach in 
ways that help a children learn.'' To the latter end, the CCI puts a special emphasis 
on the con<^pt, all children." For example, the effective teacher is expected to be 
accepting of and interested in all students, to encourage all children to achieve at the 
highest level they can, to engage all children in the leamins activities, and so on. 
However, as noted m the earlier discussion of how well the CCI addresses "Standard 30: 
Capacity to Teach Cross-cHturally" of the California Beginning Teacher Standards, the 
CCI does not completely ture a teacher's capacity to respond appropriately to di- 
verse student because 115 :.^ie focus is on those students present in the teacher's current 
classroom. If those students are all or mostly homogeneous, the CCI is completely 
unable to assess how a teacher responds to diverse students. Furthermore, although the 
underlying philosophy of the CCI may include the assumption tha a competent begin- 
ning teacher will help prepare children for participation m a cuhurally diverse world, 
tnere is nothing m the content of the CCI that assesses a teacher's ability to do so. 

The CCI could, however, be modified in order to address these issues. For in- 
s ance, the first indicator, "Positive Learning Environment," might be modified to in- 
cluoe a stipulation that, not only does the teacher "maintain a positive social and emo- 
tions tone m the learning environment," but also maintains a learning environment that 
IS both fair to different types of students-by gender, ethnicity, handicapping conditions, 
language group, etc. -and is reflective of a culturally diverse worid. 

To further strengthen the CCI's capacity to assess a teacher's ability to teach di- 
verse students, It also seems desirable to have, whenever possible, an observer who is 
taniihar with the type of student group being observed. For example, special education 
students, limited English proficient students, and some students of particular ethnic 
groups may tend to exhibit certain characteristics. .\n observer's knowledge of student 
development and of desirable and appropriate behaviors for those in the classroom 
being observed would likely contribute to higher quality observations. 

Fairness across Groups of Teachers 

The majority of teachers responded positively to the question of fairness of the 
CCI across groups of teachers (e.g, different ethnic groups, different language groups, 
etc.). Thirteen of the 18 teachers believed the assessmsiit is fair, two did not, and three 
did not give an answer (or gave an answer that did not address the question). The six 
Connecticut assessors also feU the CCI is fair across groups of teachers. 

3.24 



56 



FWL staff are unable to comment on the fairness of the CCI across groups 
because there is not enough information about the teachers' ethnic backgrounds, lan- 
guage abilities, etc., to enable us to examine teacher performance with regard to these 
dimensions. 



Areas of Most/Least Emphasis 

Because the CCI assesses a v?^riety of areas, the teachers were asked what areas 
they feel should receive the most/least emphasis in making decisions on credentialing. 
The teachers gave a wide variety of answers (there were also eight teachers who did not 
answer), some of which, together with the number of teachers who gave them, are listed 
below: 

Most Emphasis 

- the way a teacher relates to students 

- teaching methods or instructional techniques 

- major competency areas 

- positive attitude 

- accuracy 

- flexibility 

- student engagenient 

- the tone of voice in the classroom 

- monitoring for understanding 

Least Emphasis 

- classroom management ^2^ 

- style (1) 

- high level knowledge of content (1) 

The majority of CT assessors generally felt that none of the areas should receive 
most/least emphasis. In defense of their opinion, two of the assessors made reference 
(^either directly or indirectly) to the "integrated, holistic nature of teaching,'' and thus the 
importance of all the areas. One assessor explained, "A teacher with a dynamne plan 
but no discipline is no more effective than a teacher with great discipline and no plan.*' 

One assessor disagreed and specified three areas she thought should receive the 
most emphasis: maintaining appropriate standards of behavior, promoting a positive 
learning environment, and monitoring student understanding and adjusting teaching. 
Regarding the latter, she commented, **No matter how good the teacher thinks the 
lesson is, without monitoring and adjusting she'll never know." 



Assessment Foraiat 



Traditionally used by school administrators, the classroom observation method of 
assessment is generally accepted by teachers, administrators, parents, and the general 
public as an appropriate method to assess teacher competence. It is relatively easy to 




3.25 



administer because it_ requires minimal materials (paper and pen for the assessor) and 
no special sertmg. Moreover, the person making the" observation usually focuses on one 
specitic area or takes general notes on a variety of areas. The classroom observation 

r^lT^T ^ f^'^^y e^sy to administer, but its format 

renders it more difficuh than most traditional systems because the CCI requires the 
assessor to scnpt, as best as possible, the entire lesson. In addition, the CCI analysis 
emails much more writing than traditional systems, and, perhaps, more careful codifica- 

From another perspective, the CCI cannot be easily administered to groups of 
teachers because its format requires one assessor observing one teacher at a time 
Because the format requires the assessment to take place m the teacher's classroom 
(which IS convement for the teacher), the assessor must be able to travel whatever dis- 
tance necessary to observe at the teacher's school site. Needless to say, in the state of 
Calitornia, these format issues pose a formidable challenge. 

Other format issues more easily addressed are the clarity of the assessment, the 
clarity ot the assessment materials, and the question of giving feedback as part of the 
assessment. 

Oarily of the Assessment 

Before the observations, each participating teacher received the Connecticut 
competency Instrument which described the aspects of teaching being assessed (i.e., the 
10 indicators) However, only nine of the 18 teachers who returned the evaluation 
forms responded positively when asked if they knew what aspects of teaching were being 
measured by the assessment (six teachers said "no," and three did not respond). Four 
of the nine teachers who said "yes" were also ab!e to identify the aspects that thev 
believed were being measured. Of the six teachers with negative responses, two ex- 
pressed some confusion as to what was being evaluated - the teacher or the CCI? 

The responses to the above question lend credence to the recommendation made 
by some of the assessors that the teachers have assistance in reviewing the instrument, 
to nnJ.£^^H°/v.'^^'TP^'°^ °^ indicators are probably too mSch for a teacher 

a«e«eTi.h /h l^^^'^'i important that a teacher about to be 

^^nf f.l?? • u ™ have the opportunity to discuss the CCI process and the instru- 
ment Itself with someone who is familiar with the CCI assessment. 

Clarity of Assesnnent Materials 

fhev rpJilw i^'''^^" °^ '^P°'^^'^ t'^^t the CCI assessment materials 

thev received pnor to the assessment were helpful. An even larger number of teachers 

^\ I I f ^^"^P^^ pre-assessment information form was helpful. Teachers were 

also asked to comment specifically on the pre- and post-observation forms. Almost all 
ot the teachers (15) found the questions on the forms to be understandable. One 
teacher had praise for both the questions in general and the post-observation questions 



3.26 

58 



The questions were clear; I liked being able to explain spe- 
cial circumstances or changes in plans. 

Based on the comments from the teachers and CT assessors, the pre-assessment 
information fonn and the pre- and post-observation interview forms seem to be especial- 
ly valuable and key to the observation and evaluation of the teacher's behaviors. The 
pre -observation form gives structure and meaning to the teacher's lesson, and the post- 
observation form allows the assessor to understand the teacher's response to the class- 
room context, and to gauge the extent to which instruction was altered according to that 
context. If California decides to use a classroom observation instrument in teacher 
credentialing, there are several ways in which the materials should differ from the CCI 
materials. 

Above all else, an identification number should be assigned to each new teacher 
to permit the linking of various documents and it should appear on each page of all the 
forms. Other suggestions for changes to some of the forms are as follows: 

Fre*assessment Iiifonnation Form - Change so that it collects information on the 
name of the school, the name of the city, and the composition of the class (e.g., gender, 
ethnic groups, languages spoken, students receiving, special services). Revise question 
#3 of this form, which asks what other adults will be in the classroom during the obser- 
vation, to ask about other students that are not part of the teacher's regular class. 
Require, whenever possible, that the teacher attach relevant materials, such as copies of 
worksheets that are connected with the observed lesson, 

Pre-observation Interview Form -- Rewrite for better clarity. Question #1, for 
example, asks the teacher if she/he has made any changes to the lesson plan described 
in the Pre-Assessment Information Form, If the teacher's response to that question is 
that there are no changes, Question #3 might be confusing because it asks the teacher 
about "other changes" that the interviewer should be aware of, 

ScriptiDg Sheets - Current scripting sheets are notepad page: of the assessor's 
choice to which the assessor adds columns to match the CCI format. Replace with pre- 
priiited scripting sheets designed to conform to the official format. Include a space to 
collect information on the number of students on task, which would also serve to remind 
the observer to collect this information on a regular basis. 

Assessor Rating Summaiy - Modify to include a place to indicate if there is an 
Incident Report or not, (An Incident Report is completed by the observer if there is any 
irregularity that could aftect the validity and accura(^ of the observation and ratings.) 

Documentation Checklist - Connecticut provides a checklist for observers, listing 
their procedures and responsibilities. Prepare a similar document for California observ- 
ers including timelines and addresses for sending documentation, a one-page list of clus- 
ters, indicators, and attributes in outline format with codes assigned to each, and a list of 
standard abbreviations for use in scripting to ensure some consistency across observers 
and for use in interpreting and reviewing another person's script. 



3.27 



Observation Feedback 



The format of this assessment did not include giving feedback to the teachers, 
because the purpose of the pilot test was to evaluate the CCI instrument. Almost all of 
the teachers (16^ of 18) mdicated they wolU have liked some feedback, and two of the 
Connecticut assessors indicated that not be-ng able to give the teachers feedback was the 
most difficult (or one of the most difficult) aspects of this pilot study. 

The teachers were asked to describe what kind of feedback would be helpful 
u °1 assessment, and also by whom, when, and in what format the feedback 
snouia De provided. The most common response was that the assessor should provide 
tive Sm (7 teachers) ' possible (9 teachers), and in a constructive, posi- 

A few teachers were not so concerned with how or what kind of feedback be 
given, just that it be given. Said one teacher: 

Some feedback would be nice, in any fonn. 

Clcariy, teachers desire feedback. However, the content of and process for 
providing observation feedbadc need to be carefully considered. In Connecticut, the 
feedback process for 1989-90 will consist of the beginning teacher receiving feedback 
after each observation. The feedback will provide information on whether the teacher 
demonstrated sufficient or insufficient skills relating to each defining attribute, or wheth- 
er there was insufficient evidence to arrive at a conclusion. If California considers 
adopting a similar observation feedback process, feedback should be given relatively 
soon after the lesson so the teacher has a clear memory of the lesson to which the' 
Snt«?P .^^f''^' a feedback checklist corresponding to the indicators and 
^ f u°^^ '"u?™ °f the specific behaviors that were observed and 

i^r^t ? f ^!^^P^^^^^ or pot acceptable. Thus, if a teacher receives a negative rating 
iZ^! u " '^PP?';^ °^ "Positive Learning Environment" indicator, 

she/he has no way of knowing which behaviors observed contributed to that rating. 

Consideration also needs to be given to the use of mentor teachers in the feed- 
^ nientioned earlier, teachers in Connecticut are encouraged to share 
their feedback results with a mentor teacher in order to get assistance in interpretation 
and guidance for improvement During the pilot test in Connecticut, however, mentor 
Ttfnn i^K^P- • of ambivalence or reluctance about participating in the evalu- 

fiL.^ begiimme teachers. Such feehngs may have resulted because (1) the mentor 
rnn^-^^lV?P°l®- "u^ especially knowledgeable about the CCI and thus not well 

equipped to advise begimimg teachers on either interpretation of the assessment results 
w^nlTi, ™P/°^^ the areas found to be unacceptable, and/or (2) the mean number of 
S nlt k""? "r^^^'^^^" J^P.°^^^ observing beginning teachers is two, and thus they 

• fi^?''^*" ^"°"f ^ "^th the beginning teachers to recognize patterns of behav- 
ior that might illustrate adequate or inadequate performance. Should a classroom 
observation assessment be selected for use in California, a system of feedback should be 
developed that aids teachers in improving their performance. If mentor teachers are to 
hll . u ^y^^^'?' then they will need to receive training in the instrument and 
nave time to observe the teacher to be of any real help. 



3.28 



60 



We have used the experience and time associated with administering and scoring 
the current version of the CfCI as a basis for pioviding some initial estimates of the costs 
of administering a California version of an observation system. We will outline the 
assumptions and basis for estimating the costs. It is important to view these as only 
general, incomplete estimates. To provide for more specific and complete estimates, it 
would be important to assess the feasibility of alternative methods for implementing the 
CCI. These could include varying the method used to administer and score the observa- 
tion, and using alternative methods to allocate and absorb the costs of administration. 
For example, it might be possible to develop an observation system that reduces the 
scoring time from the four hours needed for the current version to one hour. Also, the 
CCI could be combined with other assessments such as interviews, assessment centers or 
written examinations in a manner that would affect the costs of administering each. 



Assessor Time 

Administering the currePi: version of the CCI reauires a trained observer or asses- 
sor to (a) prepare for and arrange for the assessment, (b) review the pre-assessment 
form, ^c) conduct the pre-assessment meeting, (d) observe for 45 minutes to one hour, 
and (e) analyze the teacher's performance according to the ten indicators. The current 
analysis system requires as many as four hours to summarize and score an observation. 

Allocating up to four hours for scoring and two hours for the other activities 
would imply that the assessment could be completed within six hours. Using an hourly 
rate of $20 per hour would cost $120 per observation for the assessor's time, which 
would account for the majority of costs. If the scoring time were reduced to one hour, 
the cost of assessor time would drop to $80 per observation. 



Training Costs 

The current version of the CCI requires a five-day training session and a two-day 
follow-up session. If we assume that each assessor could be trained and certified in this 
amount of time and that each would conduct 30 observations each year for five years, 
we could distribute the costs for training the person would be distributed over 150 
observations. Reimbursing the assessors for the seven days of training at $20 per hour 
or $160 per day would add about $7 to the cost of each assessment. 



Other Costs 

Other costs would include those associated with telephone, duplication, postage, 
and travel where needed. Travel could be expensive in a state like California unless 
regional assessors were used. A regional system of assessors that involved little travel 
would minimize the cost. Placing an estimate on the costs of these activities or ingredi- 
ents would depend in large part on the manner in which the system was ultimately 
designed and how costs were apportioned. Using a figure of $30 per assessment of 



3.29 



61 



5f ^""^'^ minimal travel costs, based on our eroerience from 

tfle pilot testing. ' " "* 

ver<:innT!f the foUowing cost estimate to administer a revised 

M u^°^^^ ^9st could be as low as $117 per assessment if the ob- 
servers could score the observation in one hour. 

Assessor: $120/assessment 

Training: $7/assessment 

Other: $30/assessment 

Total: $157/assessment 

!Ii?12fi?^' ?i iPPJementing an assessment like the CCI and conducting multiple 
^,?^,^f. "^""-^ ^^P®"'^ °" 5^^^^^^ f^'^tors as already mentioned. The costs for 
Sp cfn^o K ""^^ as a function of whether all candidates were observed 

riLT^l TJS!!^' °^ "^^^^^^^ °"ly candidates who failed to demonstrate profi- 

.WhX .h^i ? assessment(s) were subsequently observed. Additional costs would also 
.Ih developmg and managing the assessment system. But, the system 
iH "S^'f V° '^^''^^ ^^'^ *yPe of assessment might be merged with other 
Se «ftTiifi S management and related costs. Estimates for these should be 
made after some of these alternatives are explored and specified. 

Cost Suminaiy 

,:nn,. 3u ^^perience from pilot testing a limited number of CCI assessments yields 

X^- costs that might be associated with such an observation 

fmo J^^vc^ ^"^^y?'^ "^^^ n^ost of the ingredients that might go 

hp .nn^ ""^ u refined estimates need to be made after the assessments that might 

sco^g) ar? made '"'^ °" '° (^'S- "methods for 

Technical Quality 

m^nt ^^^^^ technical issues related to the CCI -- develop- 

oTfrom1h?&o"r^ati^^^^^^^ '''' "^"^'^^ ^"'^^ 

Development 

nf the rrfltu^r^^ discussed in the Content section of this chapter, the development 
wa the SnH?;.!?" ^ ? approach and included several major steps. The first 

Te 1 teri?,?" n'^^S" ?■ of competent new teachers. Another was a review of 

the literature on effective teaching. Drafts of CCI materials, including indicators and 

3.30 

62 



attributes, were reviewed by national experts. During March 1988, Connecticut conduct- 
ed a pilot test with 42 assessors and 36 new teachers in 27 school districts. Teachers, 
teacher trainers, administrators, and other experts have been involved in all phases of 
the development process. 

Once the CCI was nearing the final draft stage. 1,582 Connecticut educators 
participated in a content validity study, reviewing the instruments in terms of relevance 
and importance. During 1988-89, Connecticut conducted a major field test involving 250 
new teachers in 67 school districts. This included teachers in vocational-technical schools 
and addressed the generalizability issue across subject areas and grade levels. A bias 
review was also conducted and a formal standard-setting process was completed. 

The development process used by Connecticut was sound. However, without 
additional information on the specifics of the steps used and individuals involved, little 
can be said about the quality of the effort The available evidence suggest that the CCI 
was developed in a professional and technically acceptable manner. 



Reliability 

Several steps have been undertaken by Connecticut to help ensure a reliable 
assessment with the CCI. These include: (1) the training of assessors, (2) the selection 
of assessors who are experienced in teaching and in the subject area when feasible, and 
(3) the use of multiple observations of the same new teacher (six per teacher by differ- 
ent observers). In order to ensure consistent application and accuracy, Connecticut 
trainers review potential assessors on five areas - completion of both sides of the "t- 
sheets," appropriateness of the data used for evidence of the defining attribute, the 
inclusion of comprehensive Jata for evidence, the writing of evidence in a way that 
specifically links data to the defining attribute, and the listing of specific examples of 
classroom behaviors, activities or circumstances. All of these procedures are designed to 
reduce the error in CCI results and thus promote its reliability. 

^ However, no information was provided by Connecticut on other aspects of the 
CCI s reliability. Data that should be collected and reported include: 

0 Inter-rater reliability - two or more observers of the same lesson at the 
same tim'^, including different types of observers (e.g., teachers vs. 
administrators); 

0 Stability of ratings ~ same teacher, same observer, different days; 

0 Review by second observer - of script, ratings and other documenta- 
tion; and 

0 Monitoring - of rating patterns of each observer across several teachers 
to identify those observers whose ratings tend to be higher or lower 
on the average than other observers, or those who consistently rate 
certain indicators or attributes high or low compared to other observers, 
so that discrepant observers can be identified and retrained as needed. 



3.31 



In addition, the training system should include s>'su.r»atic rev-iews, even for observers 
who are not discrepant, to minimize the chances of their starting to drift and to maintain 
standards. 



Validity 

Several steps, as described under "Development," were undertaken during the 
development stage to ensure the validity of the CQ, including the identification of 
important indicators of new teacher competence, the literature review, the review of 
drafts, and the content validity study. Validity must be judged in terms of the use of 
the instrument. Validity is not inherent in the instrument itself. An instrument consid- 
ered to be valid in Connecticut for teacher certification may or may not be valid for 
credentialing in California. 

Little can be said about the validity of the CCI for California or its appropriate- 
ness for various subject areas, grade levels, students groups, and school/community set- 
tmgs based on the pilot test If the CCI or any other high-inference observation instru- 
ment is field tested, California should conduct a validity study that considers the 
appropriateness of the instrument for these various settings, its relevance to a new 
teacher's job, the importance of each attribute and indicator in effective teaching and in 
protectmg students from teachers who lack certain competencies, and the fairness of the 
CCI to new teachers in terms of their opportunity to acquire the skills being observed. 
New teachers, teacher trainers, mentor teachers, and teacher supervisors should be 
mvolved m a review of the validity of the instrument They should also be asked about 
tiM clarity of the content and the process. Lack of clarity in either area will negatively 
affect both the reliability and validity of the assessment instrument 

"^^^^"^^ ""^^'^^'^^^ addresses the accuracy of the decisions made, based on the 
CCI ratings, a validity issue for California is the question of how much additional infor- 
mation is provided for use in making credentialing decisions. For example, are any 
decisions changed when the observation ratings are used in conjunction with other data 
already available (e.g., college grades, NTE scores, CBEST scores)? In those cases 
where different decisions are made once the ratings are considered, are the changes 
warranted? Will students be better protected from teachers who lack needed compe- 
tencies to teach effectively if an observation instrument is used in conjunction with 
currently available information or other sources of information? Will teachers be more 
fairly assessed if additional information is available? Related to this issue is the question 
of how many observations are needed for each new teacher. Are six observations neces- 
sary or is there enough information after four observations to make decisions for the 
majority of cases? If the latter, two additional observations might be done only for the 
borderiine cases. These questions require much more data and a much larger sample 
than was available in the CCI pilot testing, but must be addressed prior to adoption of 
this or any other assessment instrument for credentialing in California. 



3.32 



64 



Conclusions and Recummcnuauoas 



This section contains conclusions and recommendations regarding the CCI, organ- 
ized into the areas of administration, content, format, and a brief summary. 
These conclusions and recommendations would likely apply to any high-inference obser- 
vation instrument. 



Administration of Assessment 

The administration of the CCI assessment is very labor intensive, requiring nearly 
one professional person day per teacher. Seven observations in five days were deemed 
stressful by the assessors and independent observers; one per day is a more feasible 
workload, unless substantial travel time is involved. If a subject and grade-level match 
between the teacher and assessor is desired, the complexity of scheduling increases 
markedly, probably increasing the time required to administer observations because of 
greater assessor travel time. 

The following factors seem to be key to smooth administration of the CCI in its 
present form; 

0 making and confirming arrangements with both principals and teachers 
regarding the time jf the observation and the locations of the pre- and 
post-observation interviews; 

o careful design of observation schedules for assessors, with no more than 
one observation scheduled per day; 

o development of procedures for obtaining completed assessment materi- 
als from assessors in the field; and 

o arrangements for storage of a large amount (at least 25 pages) of 
documentation per teacher. 

Since the CCI assessment is administered and scored by the same person, the 
training of assessors is also a key factor to successful administration of the CCI. 
Through training, assessor candidates are taught the content of the assessment, as well 
as how to conduct and score the assessment Current training consists of seven instruc- 
tional days plus time to conduct practice observations. Training could be improved 
through the inclusion of more specific examples of written evidence and more time spent 
analyzing evidence j&-om previously prepared scripts instead of scripting from videotapes. 
It is unlikely, however, that the training could be shortened considerably. Some modifi- 
cations in the training would also be needed to accommodate the California context, 
reflecting the greater diversity of students, larger class size, and more frequent use of 
instructional aides. The training should conclude, as it does in Connecticut, with each 
assessor being required to exhibit a minimal level of proficiency in administering the 
assessment. 



3.33 



65 



Assessment Content 



Based on our observations and those of RMC staff, as well as information col- 
lected from assessors, teachers, and CCI rating sheets, we offer the following conclusions 
about the content of the CCI: ^ v^un^iuciuiK, 

o Congrjence of the CCI with the various California curriculum guides 
and frameworks is relatively weak. This is largely because (1) the CCI 
was developed m the context of the Connecticut curriculum; (2) it is a 
noncurricuhim specific, high-inference observation system, and, (3) it is 
not designed to measure a teacher's knowledge of curriculum directly. 

0 Coverage by the CCi of the California Standards for Beginning Teach- 
ers vanes. Coverage is particularly good for those standards which 
focus on student rapport, classroom environment, and student motiva- 
tion, involvement, and conduct. Coverage is partial, however, for the 
majority of standards, and nonexistent for a few. Moreover, some 
standards partially covered, e.g., Curricular and Instructional Planning 
oitills, are difficult to measure using a classroom observation system. 

o The job-relatedness of the CCI seems to be high because the assess- 
ment entails observing teachers actually teaching in their own class- 
rooms. 

o Overall, the content of the CCI does not seem too difficult for begin- 
ning teachers. Approximately 80% of the pilot test participants 
received passing scores (i.e., received an "Acceptable" rating on at least 
seven of the ten indicators). 

o A variety of subjects, grade levels, community contexts, and instructional 
techniques were observed. The CCI appeared to focus on teaching 
abilities that are applicable in all K-12 instructional contexts. The 
appropriateness of the CCI for assessing teachers of classes with less 
emphasis on direct instruction and more emphasis on practice activity 
(e.g., physical education, band, vocational education) should be studied 
further. 



o 



0 



Analysis of the rating results by grade level (i.e., elementary, middle 
school, and high school) indicates that further pilot testing with a focus 
on grade-level matches between assessors and teachers may be useful 
and warranted. 

Subject-matter and grade-level matches between the assessor and the 
t:-.acher observed might complicate administration considerably, but they 
would probably improve the instrument's ability to assess the appropri- 
ateness of content and lesson development. 

Although the creators of the CCI were sensitive to the issue of teaching 
diverse students and developed an assessment which focused on a 
teacher's interaction with all students, the CCI would need to be modi- 

3.34 



ERIC 68 



I 

I 
I 
I 
I 

I 
I 
1 
I 
I 
I 
I 
I 
I 
I 

i! 
I 
I 



If the CCI is chosen for further de'^elopment, a content validity study should be 
conducted in which California educators examine the instrument for the job relevance 
and relative importance of its indicators. 



Assessment Format 

The classroom observation format will be discussed at length in Chapter 6 and 
contrasted with the semi-structured interview and multiple-choice examination methods 
of teacher assessment One strength of the CCI format is that its focus is not on a 
simulated performance, or on hov^ a teacher says she/he would perform, or on a teach- 
er's knowledge of how to perform, but rather on a teacher's actual performance in the 
classroom. In addition, because the teacher is observed in hisAier own clas.';room, no 
special facilities are required for administration. 

The format of the CCI goes beyond traditional observation systems in which an 
assessor checks off observed teaching behaviors. The CCI requires an observer to firsC 
script as much as possible the entire lesson observed, and then to document from this 
script the evidence supporting the existence or absence of the desired teaching behav- 
iors. Such careful documentation greatly reduces the risk of an observer's subjectwit)/ 
with regard to the teaching behaviors perceived and/or toward the teacher observed. 



The CCI format also differs from other observation systems in that the actual 
observation is preceded and followed by interviews which are designed to Q) help the 
observer understand the instructional goals and classroom context which affect lesson 
design, and (2) give the teacher an opportunity to explain and justify changes in tlie 
original lesson design in response to unanticipated circumstances. The information 
provided in the two interviews and through the pre-assessment information form (which 
is completed by the teacher before the observation) allows the observer to conditionally 
evaluate teacher behaviors in light of differing instructional goals and classroom contexts. 
This observation instrument is superior to others used in teacher assessment because it 
focuses on the meaning rather than frequeniy of teacher behaviors. 

Finally, the format of this assessment requires that the participating teachers 
receive complete information about the CCI, including descriptions of the indicators 
being rated, copies of all the interviev/ protocols, and a sample completed copy of the 
pre-assessment information form. Based on the responses of the participating teachers 
and assessors to these materials and to the CCI format, we recommend the following: 

0 In preparation for the CCI assessment, a teacher must be familiar with 
a large amount of material fi.e., the content of the CCI), prepare a 
lesson to meet CCI standards, and complete a pre-assessment informa- 
tion foim. TTierefore, we agree with the Connecticut assessors who 
believe that appropriate use of the CCI requires that teachers have 
access to help m preparing for the assessment. 

0 If, as is done in Connecticut, mentor teachers are expected lo help 
teachers prepare for the CCI, it is crucial that the mentor teachers 



3.35 



67 



(and/or others who give assistance) are well acquainted with the instru- 
ment, and free to observe the laeginning teachers often enough to be 
acquainted with their usual teaching behaviors 

0 Because both teachers and assessors expressed a desire that feedback 
be a part of the CCI process, the provision of feedback should be 
considered. Also, if the CCI is intended to serve as a guide for staff 
development as well as a requirement for credentialing, then the scope 
of the assistance needed by a beginning teacher to interpret the results 
(i.e., feedback) needs to be i'lvestigated. 



Summaiy 

If classroom observations are fselected as a form of teacher assessment for cre- 
dentialing purposes, the CCI could serve as a fully developed prototype. Reviews by 
California educators may suggest that alterations should be made in the indicators and 
standards, but the procedures for conducting the observation and methods of scoring 
appear to need no further development 



ERIC 



3.36 

68 



CHAPTER 4: 

SEMI-STRUCTURED INTERVIEW: SECONDARY MATHEMATICS 



CHAPTER 4: 

SEMI-SmUCITJRED INTER VIEW: SECONDARY MATHEMATICS 



Developed by the State of Connecticut, the Semi-Structured Interview in Second- 
ary Mathematics is a performance assessment designed to assess the competency of 
beginning secondary mathematics teachers. Through an interview format, the assess- 
ment targets a beginning teacher's knowledge in the subject area of mathematics, ex- 
ploring a teacher's thought process as he or she makes instructional decisions for stu- 
dents. 

Tw'^ versions of the Semi-Structured Interview in Secondary Mathematics have 
been developed by Connecticut. They are similar, but focus on two different topics: 
(1) linear equations, and (2) ratio, proportions, and percent Each version, however, 
consists of the same Ave tasks: 

(1) Structuring a Unit A teacher arranges ten mathematical topics in a 
sequence that is appropriate for teaching the unit, explains reasons for 
the ordering based on training and experience, and discusses how the 
chosen sequence might affect student learning; 

(2) Structuring a Lesson: A teacher explains how a lesson might be con- 
structed on a topic represented by several pages of a textbook; 

(3) Alternative Mathematical Approaches: A teacher is given alternative 
solution strategies for a problem, chooses the approach(es) to use to 
teach students, justifies the approach(es'l selected, and aiscusses the 
relative advantages and disadvantages oi* each strategy; 

(4) Alternative Pedagogical Approaches: A teacher is shown five alterna- 
tive curriculum materials, selects the approach(es) to use to t^ ich 
students, justifies the approach(es) selected, and discusses the relative 
advantages and disadvantages of each method; and 

(5) Evahiating Student Performance: A teacher is shown samples of 
student work that contain an error in the solution, identifies the error 
made, and offers suggestions about remedial instruction for each kind 
of error. 

Since the two versions differ only .n the focal topic, they will be discussed to- 
gether and will be treated as a single assessment format, referied to as the SSI-SM 
throughout the chapter. 

The SSI-SM format combines two assessment strategies: the semi-structured 
interview and the assessment center. As an assessment strategy, semi-structured inter- 
views provide opportunities for candidates to respond orally to a standardized series of 
questions about tasks that are presented verbally by an examiner who uses a script or 
interview schedule. This interview is semi-structured in that it allows the use of roUow- 
up questions at the discretion of the assessor when a candidate's answer is judged to be 



4.1 



70 



^.nfif J "^^°P^P^^?- An assessment center strategy allows for simultaneous assess- 

5 IS^^' candidates, all of whom particfpate in a series of exercises or tasks 
Srilf'ff otherwise be admmistered to candidates individually. In the case of the 
fhifirn} tt.tr'T"^ organized so that a group . candidates rotated through 
rmp nlwn?^'^^^^^'^?' candidate completin ^ a dififerem subset of tasks in the same 
time penod. The order m wnich candidates performed the tasks was purposely varied. 

^rvm^B^ SSI-SM was developed by the Connecticut State Department of Education 
li^ir^ 5^ ^ three-tier assessment system that is designed to strengthen its 
?p,Jrih.5 and improve the quality of its beginning teachers. As briefly 

6 of Chapter 4, this system includes the fallowing assessments 
mfrS^,.! ^hJch IS admmistered at a different point in the beginning teacher's career): a 
Sea,nlfci£ '.f "^S?' and mathematics; a mSltipIe-cboice examination 

^ ^, knowledge (secondary teachers are assessed in their area of 

Eoi-^ elementary teachers are assessed with a custom-designed elementary 
on thi'eLTiT^?°S^' f""^ f classroom observation assessment that evaluates teachers 
2l«nc 5;"!"^'^'^°^ ^f^ective teachmg. Together, these instruments serve as important 
Sr teLSw iJu^V knowledge and general pedagogical knowledge 

(or teaching skiUs , but they do not assess the intersection of subject matter knowledge 

■■ ^^^^"bes as pedagogical content knowledge^ 

li^nnn.nL fl° ^T'^, ^^^^ ^^^^^^^ Hssessmem program measures aU essential 
SSn?^ti'f ^^^'d^d to dev-.lop and add to its system 

an instrument that would measure a beginning teacher's pedagogical content knowledge. 

mMic. '^J?nh!l"J ^"s^'^^ent was a semi-structured imerview with a focus in mathe- 
matics. The subject area of mathematics was chosen for three reasons: (1) mathemat- 
l^n^nS"' f° ^ "^""'^ tightly defined krowledge base than such 
mS^nSLt ^^^'^ °^ '^"g^age arts; (2) Connection State's math 

tZe I a rSative?v SScJ"'"'"''? ? developing an instWmem of this kind; and (3) 
XatLmattr ^ ^ '"'^''^ ^^"'^^''^ c°g"'t^^« P^o^^^^ the area 



o„H I- ^^yp'9P5"ey of the SSl-SM proceeded as a collaboration between the CSDE 
hSfiXncitJ' •' ^" ^'^^.'^ cognitive research on Mathematics teaching who 
measfement "K^r^lu T^^^g.^th teachers, as well as a background in testini and 
hS^SL "1^°"^ Pt"""^'" '^Ses of its development, the SSI-SM underwent 
A^vnW^r^^S- researchers, curriculum experts, nSembers of the State's Math 
,C neS^S^^^c?^' T^^^^^.^l^r^^"^^^ ^ho participated in the pilot study conducted 
Alt?S?S ^"^y ^^f '"^^'^^^g 24 beginning and experienced teachers, 

a nrntJl . mstrument focused on math, the developers specifically designed it as 
a prototype to be generahzable across disciplines. ^ i' j & 

Fniin, administration, c-rtent, and format of the SSI-SM are discussed below. 
J^hlTr ^. ^^^^"'O" 0^ the cost analysis and technical quality of the assessment, the 
for Sr-^ ^ summary of conclusions reached, together with recommendations 

in Wn^H.i i&lJ!: ^'^i?"";? the feasibility and utility of the Semi-Structured Interview 
m becondaiy Mathematics (or a similar instrument) m California teacher assessment. 



ERIC 



4,2 

71 



Administration of Assessments 



u- u .'^H.^^^^'i"? begins with an overview of the administration of the assessments, 
which IS followed by a discussion of the logistics of administering the SSI-SM. 



Overview 



The SSI-SM wac administered by six trained assessors from Connecticut, all of 
whom conducted the interviews during the week beginning May 10, 1989, at two differ- 
ent sites. The sample for this assessment was 20 secondary mathematics teachers, 
i able 4.1 contains information about the number of teachers assessed from each local 
pilot project, the assessment sites, and some of the characteristics (e.g., gender, grade 
level, ethmcity, and teaching experience) of the participating teachers! 

Logistics 

Logistical activities for this assessment included: (1) developing orientation 
materials for teachers and principals; (2) identifying teacher samples; (3) making travel 
arrangements lor six trained assessors from Connecticut; (4) scheduling the test adminis- 
tration; (5) arranging facilities; (6) acquiring materials for the administration of each 
task; (7) arranging for the acquisition of evaluation feedback from teachers; and (8^ 
arranging for district reimbursement for the "ost of substitute teachers and payment to 
some teacher participants. Logistical arrangements are described in detail in the Admin- 
istration Report for Spnng 1989. 

• . • '^^^^jlf " received a tv j-page description of the assessment t" mail prior to the 
interview. Shortly before the interview began, they were given general information 
nn??l K ^^s^,^"'^ the purpose of the pilot test. Largely because this assessment had 
not yet been pilot tested and thus sample responses were not available, the information 
did not include a full range of descriptive sample material that is usually accessible to 
candidates. Providing a full range of descriptive sample material, however, is particular- 
ly important for this type of assessment, which departs dramatically in form from that of 
current California teacher assessments. 

Concerns for due process and equal access have motivated most test publishers 
to provide candidates with orientation materials describing the examination's purpose, 
content format, length and evaluation standards. Often sample assessment materials 
are made available. If this or a similar assessment were adopted, teachers would need 
timely delivery of matenals with sufficient descriptive detail to allow for preparation and 
review pnor to the assessment. Assessment orientation materials would need to be 
developed to provide this kind of information. Such materials might describe the pur- 
pose, format, and rationale for this new type of format, provide sample tasks and 
component questions, and discuss the type and range of potential topics and, most 
importantly, the criteria for evaluating candidates' responses. All current topics that 
miBht be assessed could be published and sent to all ^registered" candidates a month 
betore the assessment. Orientation materials for performance assessments such as essay 



4.3 



JC 



72 



TABLE 4.1 



SEMI-STRUCTURED INTERVIEW IN SECONDARY MATHEMATICS: 
PILOT TEST PARTICIPANTS 

(Total Number of Teachers=20) 



jjcscnpuvc voaractensncs 
of Participants 


Distributions of 
Participants 






FlTdl I Car 


13 


oCCUIIU I CoT 


7 


Teaching Level 




Middle Level 


8 


High School 




Gender 




Male 


11 


Female 


9 


Ethnicity 




American Indian 


1 


Asian-American 


1 


Hispanic 


2 


White 


16 


Location of Teaching 




Fresno 


12 


New Haven 


4 


Oakland 


4 



ERIC 



4.4 

73 



examinations sometimes also recommend content and strategies that candi- 
dates might review in preparation for the assessment, and provide annotated examples 
or now responses are evaluated. 

Based on the previous pilot experience, the tasks were refined and grouped .nto 
two sets so that each set of tasks took approximately equal amounts of time; the Cili- 
forma experience was that the effort to balance the length of time of tasks was largely 
successful. Due to individual differences, there is almost always some variation in the 
lengtn ot time. Therefore, some arrangement must be made for smooth transitions 
between tasks. Procedures for handling especially verbose teachers who take longer 
penods of time need to be established. Teachers who do not communicate useful 
mtormation m light of the scoring criteria can be prompted to finish; the more difficult 
decision is when to cut off the occasional teachers who take a long time to complete 
tasks because of superior breadth, depth, and detail. 

Facilities required for this assessment included four interview rooms (one for 
eacn task) and one coordination room (for assembling before and between interviews) 
for every day of mterviewmg. Although FWL staff investigated a wide range of sites, 
we expenenced severe difficulties in locating appropriate facilities viath large numbers of 
small rooms. If the assessment were held on weekends or during the summer, vacant 
school or college classrooms could be utilized; for assessments held during the school 
week, similar problems m locating facilities can be anticipated. 

^ The interviews were videotaped to provide a visual record for scoring the teach- 
ers responses. (The scormg system was developed at a later date, precluding the 
option of sc9nng simultaneously with administration.) The use of videoiapinl equip- 
ment precipitated some disruptions and delays due to technical problems. Clearly, if 
videotaping continues as an assessment component, a technician needs to be close at 
nana. However, the necessity of videotaping rather than audiotaping should be further 
considered. To date, there seems to be little indication that any visual infonnation is 
pertinent to evaluating responses or monitoring the assessors. Audiotaping is less tech- 
t"h?candiStes ^^^^ effective, in addition to preserving the anonymity of 

As with the other assessments, teachers were asked to complete a questionnaire 
asking their opinions of the SSI-SM in which they had just participated, feachers were 
not asked to differentiate between the tasks for linear equations and those for ratios, 
proportions, and percents. Assessors also completed a feedback form on the final day 
ot the assessment They were asked to provide their perceptions of the adequacy of 
their training to admimster the instrument, the logistical arrangements and facilities, the 
assessment format, the fairness of the instrument, and its appropriateness for assessing 
the teaching competence of new teachers. 



Security 

fi^ with all assessments, the security of teacher evaluations is required. The 
extent of security necessary for the assessment materials is unclear. On the one hand, 
the answers to questions for each task are interrelated. One could not memorize iso- 
lated answers to questions, but would need to memorize an entire script On the other 
hand, at one time, the state of Georgia included a semi-structured interview to gather 



4.5 



74 



information about a teacher's individual ponfoUo as part of its assessment instrument 
ine standardized questions allowed thi develornient and ts^chm" of standardized 
answers to begmning teachers which circumvented the effectiveness of^the assessment, 
so the mterview portion of the Georgia assessment was deleted. Before the adoption of 
this or aiw other semi-structured interview assessment, the robustness of semi- 
structured interviews with respect to developmem of standardized answers would need 
to be investigated. 



Assessors and Their Training 

Six trained assessors from Connecticut participated in the pilot testing. As was 
the case with the CCI pilot test (Chapter 3), the use of trained assessors from Connecti- 
cut rather man Cahfomia assessors served to reduce the costs ~ both in terms of time 
and money - of the SSI-SM pilot test Using trained assessors from Connecticut both 
reduced the time needed to coordinate the pilot test (e.g., no assessor recruitment or 
trainmg necessary) and eliminated the costs associated with these two activities. 

Tlie six Connecticut assessors were all mathematics teachers with over five years 
V^l' ^^'^h participated in a one-day training session 
which included back -ound information about the semi-structured interview, approxi- 
mately two hours of lecture on methods of interviewing, and three hours of practice in 
administering interviews FoUowine the training, in November 1988, the assessors partic- 
ipated m a pilot study of the SSI-SM in Connecticut, administering one complete inter- 
view to 10 teachers. In preparation for the Spring 1989 pilot testing in California, a 
S S,'" interpersonal communications gave the assessors refresher training in April 
1989. The refresher training consisted of (1) roughly two hours of lecture on findings 
from the November pilot study; (2) a one-hour discussion of interviewing weaknesses 
i.nnn? -15'?^^^ ^"f-^'' °^ November videotapes (e.g., probing taftics, establishing 
rapport with the candidate, and maintaining standardization across the interviewees); (3) 
a one-hour group discussion concerning changes made in the protocols; and (4) one-and- 
a-halt hours of interviewing mathematics teacher education students using the new 
protocols. One important change that resulted from the November pilot was that each 
assessor was trained to administer a set of several tasks rather than a complete inter- 
view. *^ 

• ^" test there were six assessors for four activity stations. Based on this 

experience, ^WL has determined that four qualified assessors can adequately handle 
tour stations. Additional assessors might be used for either training or coordination 
purposes, but are not needed for managing the assessment. 

AU of the assessors believed that their trainmg had been adequate. Three asses- 
sors nientioned the importance of practice in administering the tasks, with one of them 
remarking that the trammg was inadequate without it. Two of these assessors also 
mentioned the work on probes as beinc useful. The only suggestion for improvement 
came tro..i another assessor who would have liked more feedback on his performance. 

Knowledge of the subject area is necessary to construct appropriate probes. 
Although our observations of the questioning indicated that the scripts prompted a 
tairly high degree of comparabilitJ^ there was some unevenness in probing for additional 
detail or explanation. Analysis of pilot tapes suggests the desirability of such probes to 



4.6 



ERIC 



75 



better reflect the extent of the teacher's knowledge. However, the degree of probing 
can affect the rating of a teacher's response, with uneven probing affectine the fairness 
01 tne assessment across teachers. This is a dilemma whicli needs further^exploration 
with particular attention to such issues as the variability of assessor probing, the condi- 
tions urider which the variability occurs, and the implications of the analysis for selec- 
tion and further traimng of assessors. Improved alignment between questions and 
scoring cntena would also reduce, but probably not eliminate, the necessity for probing, 
uunng meetings on the development of the scoring system, some interviewers suggested 
that explications of the sconng criteria should infoim the development of guidelines for 
types and degrees of probing beyond the scripted questions. 

«n, ss.or training will be significantly altered if interviewers also serve as raters. 

Wftether havmg the interviewer assume both roles wiU negatively affect the composure 
and/or performance of the candidates cannot be determined from this pilot test, but this 
issue will be a significant question for further pilot or field testing. The circumstances 
under which and the manner in which an assessor takes notes, probes, and reacts to 
candidates responses will need careful attention. 



Teacher and Assessor Impressions of Adraimstration 

Seventeen cf the teachers felt that the arrangements were reasonable. Two 
specific suggestions for improvement were longer breaks and earlier notification. 

Assessor comments focused on equipment and facilities. Several mentioned the 
importance of sett ng up the video equipment well in advance and instructing the asses- 
sors in Its use. Two assessors commented that the use of hotel guest rooms did not 
contnbute to a professional" atmosphere. However, another assessor had the contrary 
impression, citing the desk-and-chair setting in a small hotel room as more professional 
than the use of a conference table in a mid-sized meeting room. One assessor was 
distracted by noise commg from the room next door. 



Scoring 



One purpose of the administration and especially the videotaping of the assess- 
ment was to allow the ftirther development of the scoring system. (The videotaping 
allowed repeated viewing of an interview and the testing of different scoring methods.) 
lUe sconng approach has continued to evolve during the development of the semi- 
structured mterviews. A team of consultants, Connecticut State Department of Educa- 
tion statt, and committees of Connecticut teachers have worked to augment the design 
ot a sconng system that was pilot tested in Connecticut during the 1987-88 school year. 
Ihe sconng system is in the final stages of development, but is not a completely de- 
veloped prototype at this time. 

One purpose of the continued development was to identify knowledge domains, 
key indicator.^ and quality criteria that might be applied across subject areas. The 
emerging sconng approach specifies three domains of expertise: Curricular/Content 
Pedagogy, and Knowledge of Students. There are cunrently two indicators or clusters of 
knowledge within each domain. To facilitate understanding of each indicator, it is 



4.7 



7f> 



fEo™?iS'^ toSoT'^m^^^ S"°"'='^8s. <" dispositions 

^feth Scoring Indicators and Indicator Elements 

The specific indicators as of the fall of 1989 are as follows: 

° S\nHWnf •"'^lP?''iP^^'' '^"^ concepts of the content area. 
JStot^H f K?' ^ ^^'^ °^ '^^^^^^^ knowledge. Candidates are not 
Kthem«t?f Ln! ""^^il'l^"^ ^mathematicians; however, inaccuracies 
tenTelntS^^?°'' °' '"^^^ ^""^^ ^ terminology, inappropriate 

«n ?w"'-S^-'?'!S^P*' a" point to wea^esses as a mathc 

matician that wiU mterfere with effective instruction. Elements include: 

- mathematics 

- mathematical terminology 

0 CC2: Understands mathematical interrelationships among tooics and 
cTc?n?muf he' ?o" '''' relationship^' 'SSmatSl 
that c?eat?an ^nnrS"'^ b^^^d on interrelationships 

bofh the n,,?^n2P/T^^^ curriculum for instruction- CCC2 addresses 
elude: perspective of the content area. Elements in- 

- identifying prerequisite knowledge and skills 

- sequencmg topics based on a mathematical perspective 

- grouping topiw based on the mathematics addressed 

- laentitymg real world applications of topics 

- mking content to specialized skiDs (e.g., critical thinking) 

- Imking contem to a broader curriculum ^ 

' ^cuSlum'^' materials, etc., as related to the broader 

° ^f?'' ^"^erstands effective practices, successful approaches, and poten- 

the core of the content-bound instructional knowledge. It asks whether 
inte"^att.S-?^^^^^^^^ ^^^'^her of mathlmatlcs capable of 

InoSeHaf J? ""^'^ general pedagogical skills and his or her 
Snte of should measure the common 

th/rf.S^ mathematics" and "pure pedagogy." CP3 should also 
eimnle L^SS ? °^ ^he teacher's instructional repertoire of content 
examples, analogies, matenals, etc. Elements include: 

- examining alternative approaches to instruction on the basis of 
content 

- examining the relative importance of topics 

- examining the relative difficulty of concepts 

- anticipating problems all students will encounter 



4.8 



77 



- adjusting instructioii based on mathematical context and practical 

considerations 

- identifying supplementary inst)-uctional materiols 

- selecting mstructional approaches that are appropriate to the 

instructional objectives 

- selecting instructional activities that are appropriate to the instruc- 

tional objectives 

- demonstrating an instructional repertoire appropriate to the 

content area, including examples of concepts, effective analogies, 
multiple procedures for teaching concepts^ representative analo- 
gies, and sound presentations 

CP4: Understands effective instructional practices that facilitate learn- 
ing and are independent of the subject area* This indicator measures 
the candidates general pedagogical knowledge. Any evidence of sound 
instructional approaches that are independent of the content area 
should be credited under CP4 rather than CP3. Skills taught in a tradi- 
tional teacher preparation program related to classroom management, 
lesson planning, lesson monitoring, routines and transitions, and general 
evaluation are represented in this indicator. Elements include: 

- structuring a lesson 

- providing clear opening and closing? to a lesson 

- monitoring student understanding during d>ect instruction 

- monitoring time on task 

- maintaining routines to facilitate transitions from one activity to 
another 

- maintaining a sense of order in the classroom 

- encouraging student responsibility for their own learning 

- selecting appropriate grouping and other instructional strategies 

- fostering independence and interdepender.es of learners 

- evaluating student work (formal and informal) 

- providing feedback to students 

- evaluating instructional outcomes 

KS5: Justifies instructional practices and approaches on the basis of 
student background and interests. This indicator measures the extent 
to which the teacher considers the background, needs, and interests of 
his or her students, or students in general, in selecting instructional 
approaches that facilitate learning. One component of the indicator is 
the consideration of motivational strategies, felements include.: 

- soliciting information about student background and interests 

- selecting lesson activities, presentations, and explanations that 

reflect student background and interests 
' designing instruction that considers the self-concept/self-esteem 
needs of students 

- connecting instruction to the real world experiences of students 

- building on the informal and intuitive knowledge of the students 



49 



0 KS6: Justifies instructional practices and approaches on the basis of 
student abilities. AUention to student ability md the neeH tn monJtnr 
and adjust instruction based on ability grouping, setting appropriate 
standards, etc. aU contribute to this indicator. Elements include: 

- soliciting information about student abilities 

- selecting lesson activities, presentations, and explanations that 

reflect student abilities 

- modifying instruction to build on a student's existing mathematics 
knowledge base 

- developing alternate approaches to instruction for a given concept 

basec" on a range of individual student skills 

- identitying special approaches to instruction for highly capable/less 
capable students 

Videotapes of teachers were viewed by the Connecticut development team and a 
number of Connecticut math teachers to identify instances of specific levels of perform- 
ance. Once consensus was reached on these "marker tapes," the performances were 
compared to Identify key distinctions between them. These distinctions were codified 
mto descnptions of performances for each rating category. The marker tapes were then 
used to anchor the professional judgments of the scorers to the set of established stand- 



Scoring Process 

Although future plans call for the development of a scoring system which is 
^^/^l^^ ''^'"8 interviewed, at present, all interviews are video- 
taped and then scored offsite. The current scoring system requires the scorer to view 
.he videotape of a task and then record evidence in one of three columns representing 
the three sconng domams described above. Evidence consists of notations of appropn- 
ate or mappropnate statements about mathematical concepts, instructional techniques, 
rJvfrf.E -^^ students by the teacher. Once the viewing is complete, the scorer 
reviews the evidence recorded under each knowledge domain and codes the statements 
according to the indicators. At this point, the scorer may decide that the evidence has 
been misclassified as to domain, and reclassify it into a more appropriate domain. 
Proper c assitication is cntical to reliable scoring. To assist the scorer in appropriate 
classification of evidence, elements of each indicator have been further delineated to 
guide the coding process. 

For each of the indicators, the rater then uses three response characteristics to 
evaluate the quality of the candidate's explanations and justifications: the appropriate- 
ness of statements; the breadth of the repertoire; and the depth with which the candi- 
date provides specific, reasoned examples. The rater weighs the importance of each of 
these cntena and evidence for them in deciding on a summary, rating for each indicator, 
ine summary ratings, m increasing order of proficiency, are: "insufficient," "marginal," 
sufficient,' and "proficient" The rater writes key evidence from the candidate's 
statements that support and explain the summary rating. These summaries of key 
evidence, m turn, can be used to defend the ratings and provide feedback to the candi- 
date. 10 date. It has not been decided how ratings for indicators at the task level will 



4.10 

7,9 



be aggregated into summative judgments ekher at tl.e indicator level across tasks or 
across the entire assessment for credentiaiing purposes. 

There is general agreement among the scoring development team that the 
present configuration of interview tasks ooes not yield information sufficient to score all 
mdicators. Tne tasks are to be analyzed to identi^ the indicators for which the ques- 
tions elicit sufficient information to make reliable judgments. Further development 
work, by either adding questions to tasks or developing new tasks, is considered neces- 
sary before the assessment can be wc^wed as a completed prototype. Also, indicators 
addressing the ability to design summative evaluations of student learning and the 
capacity to reflectively evaluate one's own teaching are being considered as possible 
additions to the present list. 

The struggle in developing this scoring system, or any scoring system for a per- 
formance assessment, has been to identify the types of responses that typify effective 
teaching and to specify criteria for evaluating the quality of the responses. Connecticut 
teacher committees were asked to consider the following questions in weighing alterna- 
tive scoring approaches: 

(1) Is the approach based on a theoretical rationale that explains how it 
characterizes effective teaching? 

(2) Are the behaviors and criteria f!erived from empirical research? 

(3) Are the beha^/ioral and quality indicators descriptive and objective, not 
subjective? 

(4) IS the language specific enough to be clear? 

(5) Will behaviors and criteria generalize to other topics in the subject 
domain? 

(6) Can ratings and 'supporting evidence provide constructive feedback? 

Answers to these questions must precede decisions about the numerical range of 
the rating scale, the nurioer of ratings, and the development of the training system. 
California may want to consider the Connecticut rating dimensions and criteria accord- 
ing to some of the questions above. 



Discussion of ScQring System 

Significant process has been made on the development of a scoring system for 
semi-structured interviews. The Domain-Indicator-Element structure seems feasible not 
only for secondary math, but has high potential for serving as a prototype for scoring 
systems for interviews in other subjects. This system seems more useful than curriculum- 
specific, grade-specific or task-specific ones such as that of the SSI-EM (which will be 
discussed in the next chapter). However grade-specific, this favorable evaluation is 
based on professional judgments rather than any strong empirical evidence. Such evi- 
dence would consist of the development of parallel semi-structured interviews in other 
subjects. 



4.11 

SO 



, J^^ domains seem to adequately cover the runc^ of r^cuor,c»c in a ^p.- 
T^:&c7S^ti^^^^^'''' are-colle«ively adequale^s gu^5Sfe^Urnm 
cSors iS the (^^^ ^"-e^dth; it seems likely that more indi- 
ment tSt at {^t^n rn^WJ ^'^^ ^""^ needed. There is general agree- 
Satioi and feflSv?te«rS^^^^^^ ""-"^"^ '° ^^^^^^^ informal and formal 

°Du?e^ S^SaS?'' r«pons«s into separate citegories. A response fells 

Of the t?!t ?fJi°.;f potential revisions of tasks and questions suggest that the nature 
^^oroblem Sh?!!''"^^'!^ ? be^nerated foT specX Tndic^ oJs 

ent of e?ch otter '?i^^f!^?i,>^^^^^^ indicators was intended to make them more independ- 
ahJii^ fif.t • to fac litate classification of evidence. The degree of interrater reli- 

^ fJo^ a decS '^'P^^' ^^'^^^^^ of TechnSl: o" 

rater? Tf^nt^rSL ° whether future interviews will be scored by two or more 

JeSe KtSs of t^^^^^^^^^^^^^ -^'-^^^^ ^hen it ma^ be°desiSe to 

mc imcrviews 01 teacticrs who fail or who score near the passing standard. 

scoring?ateS?coS?K ^u-"^' ^ P?'""''"^ P'"°''^em in that the summative 
S"cient" f Sd 4ns^^^^^^^ ^"^^ P^^^'^g standards in their use of 

rS system of clearW deSne^lf categ)i?es. It would be preferable to have a 
the Dass^na^tpndi^^ °I performance that are defined independently of 

sfate^fo r Jce • .decoupling standards from summative ratings would allow the 

£gsnr^=«i-^v^^^^^^ 

Scorers and Their Training 

.Wn ."^e training system, is aLo curremly under development, using a holistic strategy 

P^esenti?.'Ll:^^^^ • ^" ^^'"8 assessmen^^, the t^Sg of ^ 

presenting numerous practice exercises on responses typical of each rating cateporv 

readinXrt e,^.^''?i'^^°'" 'S'^-'^'^ "^^^'^ ^his a^ore difSV ^er Iff 
reading short essays. The identification and use of sample responses that merit a nar 
ticular rating on a specific indicator are called "marke?tfpes.'^ iSS 4?^^^^^^ 
FurretrSf^eTrainT/ V ^^i? -em critical \?'"liab^^YmS^^ 
tav^s for ^^^^^ will probably need to couunue the use of video^ 

aSngSo^ SLlh ^''"S interviewed about an entire task to illustrate 

i^l^H I K * 'Vthough the domams and indicators are identical for all tasks scp-ers 

S perffniance contsnoSf' 'S'"" ^° definifions ofthe leJel/' 

pcnormance corresponding to specific rating categories. 

4.12 

o 81 
ERIC 



...:^^J^t T^^,^^^^ Should conclude with a qualifying set of exercises where 
tmS^To^^I Sftt'"^'" independently. If the trainee s ratings agree with 

STnnffv, ^"^u ^^l^^ation committee, the rater qualifies to rite indeoendent- 

Ikrl^^nf.^Z^^J^'^ ^^""^ P'^'^^^^ ^" applying the c.iioria. This step is seldom 
tn^ '^S^^i^ff^^'^f'^'^ ^sessments, but should bl in order to establish the quality 
and credibihty of the raters Penodic checks or tests of scorers' agreement vdth pre- 
scored vigpettes are also infrequent, but are recommended procedures for maintaining 
agreement levels and for preventing rater drift * 

Assessment Content 

The content of the assessment includes the knowledge and strategies to be 
Zr^f ^l . intended to elicit teachers^expertise. The tasks of the 

ST;^!?? u '"^^'^'^^s have been designed to represent significant, recurring activi- 
Sr^f H f^^" ^"^^^^ 'V^ P'^" ^heir instruction, preient and adapt it, evaluate 
ocic ? 5 progress, and reflect upon the effectiveness of their teaching. Ideally, the 
asks should represent the range of topics that the candidate will be credentialed tc 
wt;,fn„'y ' f ^ '^"^^ °^ ^""l^'^' '*"^«"ts. The tasks are intended to tap the 
r^SnnnA ^n'f ^^o^^^nand of content, pedagogy, and knowledge of students. Task 
tS? wSuId do "t candidates to describe what they would do and to explain why 

This section discusses the following content-related aspects of the assessment; 

o Co^r^nce with California's curriculum frairswork and standards; 

o Congruence with California's Beginning Teacher Standards; 

0 Job-relatedness of the content; 

o Appropriateness for beginning teachers; 

o Appropriateness as a method of assessment; 

o Fairness across groups of teachers; 

0 Appropriateness across different teaching contexts 

Congruence with CaHfomia Curricoium Guides and Frameworks 

nim ^Sl-fV. was develCijed by Connecticut for use with Connecticut teachers. 
FWL compareo the tasks and the September 1989 version of the scoring system with 
gruent° Cahfomia mathematics curriculum document, to see if they were con- 

The topics chosen as the focus of the assessment are included in the strands, or 
groups of tojncs, in the secondary mathematics curriculum described in the 1985 



4.13 



ERIC 



82 



Mathematics Framework tor California Public Schools: Kindergarten through Grade 
rwelve (whtch will be referred to as the Mathematics Framework). Ratio and propor- 
tions are part of the number strand and are to be taught by the end of the eigh'h 
grade. Linear equations are part of the algebra strand and may be inticduced m one 
of two ways. The Srst possibility is in a ninth-grade algebra class as pan of the prepa- 
ration for higher level mathematics. The secor.d possibility, for students who do not 
intend to pursue advanced mathematics, is in the first year of a r-vo-year sequerce that 
^uips students with the basic mathematical knowledge needed in a technological socie- 

The Mathematics Framework lists five major areas of emphasis: (1) problem 
solving (by which is meant the ability to solve applied problems, and not the routine 
application of algorithms to textbook problems); (2) calculator technologx-; (3) computa- 
tional skills; (4) estimation and mental arithmetic; and (5) computers in mathematics 
education. The SSI-SM addiesses computational skills and computers in mathematics 
education, "roblem solving was the f?cus of one element of a scoring criterion, though 
no questions specifically asked about .strategies to teach problem-solving; calculator 
technology, estimation and mental arithmetic were not addressed at all. The specific 
ways m which the emphases were included in the assessment were: 

0 Although the scoring system is still under development, practice scoring 
sessions observed by FWL staff included discussions of scoring the 
domain of Content Pedagogy which clearly indicated that the emphasis 
in sconng is on teaching students mathematical concepts and reasoning 
instead of memorization of mathematic; I algorithms. This is consistent 
with the discussion of computational skills in the Mathematics Frame- 
work. . _ 

0 One of the teaching techniques in Alternative Pedagogical Approaches 
was the use of computer software. Teachers were asked to relate the 
advantages and disadvantages of using the particular software com- 
pared to alternative strategies that might be ased to teach the same 
concept So wb'le a software package was included, teachers were not 
asked to discuss the broader range of potential uses of computers in 
the classroom. 

0 One of the response criteria (i.e., appropriateness) for the scoring 
domam of Content Pedagogy includes evaluation of the teacher's model- 
ing of problem-solving processes and operations, but there are no spe- 
cific tasks or questions which would direct a teacher to exolain how 
she/he would do so. To fully address this Mathematics Framework 
emphasis, tasks and component questions might explore the teachers' 
range of techniques for promoting student's problem-solving strategies. 

In addition to the five major areas of emphasis, the Mathematics Framework 
emphasized the following characteristics in terms of the delivery of instruction in 
niathematics; teaching for understanding, reinforcement of concepts and skills, problem 
solving, situational lessons, usp of concrete materials, flexibility of instruction, corrective 
mstruction/remediati-n, coop^^rative learning groups, mathematical language, and ques- 
tioning and responding. About half of these areas were elements in the PSI-SM scoring 
indicators. The tasks anci component questions directly addressed teachi '^or under- 



4.14 



83 



standing, flexibihty of instruction, corrective instruction/remediation, and mathematical 
1^??? Teachmg for understanding, as discussed previously, was at the heart of the 
v^ntcnt reaagosr aomam m the scoring system. One question in Constructing a 
lesson specificalty asked how and wl v a teacher might foster problem solving or critical 
tnmiang durmg the lesson. Flexibility of instruction was addressed by questions in 
Qjustructaffi a L«Bon and Ahemativc Pedagogical Approaches that asked about 
tu^iy ^ and less capable students. Corrective instruction/remediation was the focus of 
Evaluating Student Perfonnance. The teacher's correct use of mathematical 
language and concepts was one of the indicators of the Oirriculum Content domain of 
the sconng system. 

While identifying real world applications, an aspect of situational lessons, is one 
element of one of the Curriculum Content indicators, there are no questions or activi- 
ties which would directly cue this type of response. The same was true for fostering 
interdependence of learners, an element of the Content Pedagogy scoring domain. 

Table 4 2 summarizes the extent of coverage of the Mathematics Framework. 
ihe tasks as they are presently constituted address in depth computational skills, teach- 

V"°erstanding, corrective instruction/remediation, and mathematical language. 
Although the sconng system as it presently stands has substantial coverage of the con- 
tent and instructional emphases of the Mathematics Framework, the tasks themselves 
must be modified or, m some cases, redesigned to more directly address the remaining 
emphases to collect enough information to enable the scorers to make judgments for all 
emphases of the Mathematics Framework. 



Exten' 3f Coverage of California Standards for Beginning Teachers 

F\yL compared the SSI-SM tasks and September 1989 version of the scoring 
cntena with the 11 California Beginning Teacher Standards that student teachers are 
expected to attain when they complete California teacher preparation programs. The 
standards are composed of a general statement describing the competency together with 
tactors which illustrate the subcomponents of the competency. Each standard is dis- 
cussed separately (Listed below are brief descriptions of Standards 22 through 32 with 
each standaru defined in italics, along with descnptions of how the CCI indicators corre- 
spond to the standards), along with a discussion of how the tasks in the SSI-SM address 
each standard. 

Standard 22: Smdent Rapport and Oassmom Envtonmettt Each candidate 
estabhshes and sustams a level of student rapport and a classroom environment that 
promotes learmng and equity, and that fosters mutual respect among the persons in a 
cbss. None of the tasks in the SSI-SM address this standard. Indeed, except for the 
Clearer stated expectations regarding student conduct" specified by the standard the 
other factore in Ac standard address teacher behavior when interacting with students, 
inis might be difficult to simulate in an interview situation, compared to observing 
actual teaching. Qu'^stions could be developed which take one of two approaches: (1) 
ask a teacher to explain how they judge and evaluate their classroom environment and 
rapport with their students; or (2) require a teacher to evaluate and offer suggestions for 
a hypothetical class. However, the relationship between teacher responses to these tasks 
and their observed ability to establish rapport is likely to be slight 



4.15 



84 



TABLE 4.2 

COVERAGE OF THE CALIFORNIA 
MATHEMATICS FRAMEWORK BY SSI-SM 



Content 


Method of Cov^ff^ 


Ext^of 


Areas of Enaphasis: 




vovroigo 


Problem Solving 


-An element of an indicator; requires development 
of tasks or questions to sisscss fully. 


Partial 


Calcul..or Technology 


-Would require development of new tasks or questions. 


None 


Computational Skills 


-Major focus of tasks anu questions; implicit in scoring 
criteria and could be strengthened. 


Partial 


Estimation and Mental Arithmetic 


-Would require development of new tasks or questions. 


Ncu^ 


Computers in Mathematics 
Education 


-A software program of a series of pedagogical 
approaches comnared and coastnu^tcd. 


Partial 


Delivexy of Instruction: 






Teaching for Understanding 


-Implicit in tasks: maior theme i^f inHtra. '\r 


Full 


Reinforcement of Concepts 
and Skills 


-Not directly addressed by tasks or questions; could be 
scored under an indicator. 


None 


Problem Solving 


-Not direcUy addressed by tasks or questions; an 
element of indicator. 


None 


Situational Lessons 


-Not directly addressed by tasks, questions, or 
indicators. 


None 


Use of Concrete Materials 


-Not directly addressed by tasks, questions, or 
indicators. 


None 


Flexibilitv of Tn^fnipHnn 


-Focus of questions m two tasks; breadth contributes 
to rating for all indicators; major theme 
of indicator. 


Full 


Corrective LiStructions Remediation 


-Focus of one task; an element of indicator. 


Partial 


Cooperative Learning Groups 


-Not addressed by tasks, questions, or indicators. 


None 


Mathematical Language 


-Implicit in tasks; major theme of indicator. 


Full 


Questioning and Responding 


-Not addressed by tasks, questions, or indicators. 


None 



4.16 



Er|c S5 



Standard 23: Cumcular and InstTucdonal Planmng SkSls. Each candidate pre- 
pares at least one unit plan and several lesson plans that include goals, objectives, 
strategics, acv':iiies, maieriais and assessment plans that are well defined and coordi- 
nated with each other. This is the focus of the task, Constructing a Unit, and is partial- 
ly addressed by the task of Constructing a Lessoa Cbnstructing a Unit requires each 
teacher to oraer mathematical topics in a unit according to the best way to teach them. 
The appropriateness of the ordermg and the teacher's explanation of the reasons under- 
tying the ordering serve as evidence for two scoring domains: Curriculum Content and 
Content Pedagogy. (The measurement of the teacner's understanding of the selected 
mathematical topics and their interrelationship is beyond the scope of this standard, but 
such content knowledge is necessary to plan effective instruction.) Ccnstructing a 
Lesson asks a teacher to plan a lesson on a given topic; however, only the single lesson 
and not the preceding or following lessons are descnbed, so there is no qpponunity to 
judge the coordination or development of a series of lessons, so the SSI-^ is only 
partially congruent with this standard. 

Standard 24: Diverse and Appropriate Teaching Each candidate prepares and 
uses instructional strategies, activities and materials that are appropriate for students 
with diverse needs, interests and learning styles. This standard is addressed to some 
extent by the tasks, Constructing a Lesson and Altentati-'c Pedagogical Approaches, 
both of which ask questions about altering choices for "highly" and "less'*^capable 
students. The sconng domain. Knowledge of Students, has one indicator which specifi- 
cally addresses adjusting instruction for students of different abilities and one which 
includes designing instruction to reflect student background and interests. There are, 
however, no questions which directly address the latter. Similarly, building on prior 
student learning is an element of the scoring domain of Content Pedagogy, but there 
are no questions which elicit information about how teachers plan to do this. Diversity 
of interests beyond academic interests and the use of a variety of approaches and 
materials that are free from bias are addressed neither by the SSI-SM tasks nor by the 
scoring system. The addition of questions or vignettes or the development of a new 
task would be necessary to fully address this standard. 

Standard 25: Student Motivation, Lavohement and Cbnduct Each candidate 
motivates and sustains student interest, mvolvement and appropriate conduct equitably 
during a variety of class activities. One question in the task of Cbnstructing a Lesson 
asked how students would be actively involved during the lesson. The response to this 
question would most likely yield information that could serve as evidence for the indica- 
tor addressing motivation of students in the scoring domain. Knowledge of Students. 
The task of Alternative Pedagogical Strategies also asked the teachers to take into 
consideration the students' needs and interests. While "monitoring time on task" and 
maintaining a sense of order in the classroom" are elements of the scoring domain of 
Content Pedagogy which address this standard, no questions or tasks directly elicit 
information to assist a scorer in making judgments about the teacher's competency for 
these elements. Equitable treatment of students is not addressed by the SSl-SM. To 
fully address this standard, a new f sk would need to be developed, e.g., either \ngnettes 
of student misconduct or questions '^citing a description of a teacher's student behavior 
management system. 

Standard 26: Presentation SkUls. Each candidate communicates effectively by 
presenting ideas and instructions clearly and meaningfully to students. The discussion of 
this standard addresses the linguistic complexity and nonverbal aspects of a teacher's 



4.17 

86 



S'^"!".^^°"^TT^-^^ Students. Two of the SSI^M tasks, Evahiating Student Work and 
i:?^^!f2?°* ^^^^ information on tsacher explanations of concepts which 

„«a.a De scored under tiie scormg domain of Content Pedagogy. Some aspects of a 
teachers prescntafaon dunng the mterview, e.g., clarity of explanations, theVpontaneity 
and organization of his or her responses, and degree of enthusiasm, might serve as a 
crude proxy for his or her presentation skills in the classroom. However, the degree of 
relationship between a teacher's behavior during the interview and behavior in the class- 
room mteractmg with students, especially at the elementary level, would need to be 
investigated before using interview behavior as a proxy with any confidence. 

S^i^i J^i^osis, AcbievcmeBt and Evaluation. Each candidate 
laentihes students pnor attainments, achieves significant instructional objectives, and 
evaluates the achievements of the students in a class. Although Evahiatmg Student 
Perfonnmce focuses on the remediation of student errors, SSlISM quesitois do not 
address the setting of high standards for achievement, ascertaining prior attainments, 
and designing and interpreting both formal and informal means of evaluation. A teach- 
er might volunteer information addressing these factors in this standard; any such 
mformation would probably be scored under one of two scoring domains. Content 
redagogy or Knowledge of Students, with the exact scoring depending on the nature of 
the information. Another task would need to be developed to fuUy address this stand- 

5tan(bnf2& Cognitive Oatconaes of Teaching. Each candidate improves the 
ability ot students m a class to evaluate information, think analytically, and reach sound 
conclusions. Student outcomes are not directly addressed by the SSI-SM, nor could 
they be addressed directly in an interview format However, the thrust of the scoring 
domain of Content Pedagogy is whether the teacher is laying a cognitive foundation that 
enables the student to achieve understanding of mathematical concepts and their inter- 
w ^"^^^^ pedagogy is both one of the three scoring domains, and an indicator 
the domain. The ability to use subject-specific content pedagogy is strongly 
assessed, but the degree to which one believes that cognitive outcomes of teachmg are 
assessed depends on the confidence that one has in the links between content pedagogy 
and student outcomes,. Lay audiences, including legislators, may desire more direct 
evidence ot cognitive outcomes of teaching than are possible in a semi-structured inter- 
view format 

w ^ctiw Out:xmies of Teaching Each candidate fosters positive 

student attitudes toward the subjects learned, the students themselves, and their capaci- 
ty to become independent learners. The encouragement of positive interaction among 
students and the provision for independent learning experiences is not addressed by any 
01 the tasJcs m the SSI-SM. Student motivation was discussec under Standard 25. 

Standard 30: Capacity to Teach Cross-cuhmally. Each candidate demonstrates 
compaM)ihty with, and abiUty to teach, students who are different from the candidate, 
ine amerences between students and the candidate should include ethnic, cultural, 
gender, Imguatic and socioeconomic differences. This is not addressed by the SSI SM. 
Adaptation of the tasks or development of new tasks would be required to address this 
standard. One possibility is to provide more specific information about the classroom 
contexts tor which the tasks are to be performed, and add questions asking how the « 
context mfluenced the candidate's decisions. 



4.18 



c*..w^n*?^?^^^-* ^"^^^^I^DiveiscRespcmsibimhs. Each candidate teaches 
students of diverse ages and abibties, and assumes the resDonsibihties of full-time te^rh. 
ers. /\iinougn ims standard addresses student teaching eroerience, it can be construec" 
:«?«^"k i ^^'^^l^ ^ prepared to teach courses spanning the curriculum 

° u S y *® ^P^^'^S credential, the SSI-SM does not do this; one possible revision 
the cuirfcuhim^^^ ^° ^^^^^^^ differing topics which occur at various points in 

Standard 32: Pro&ssional ObUgatkms. Each candidate adheres to high stand- 
ards of professional conduct, cooperates effectively with other adults in the school 
commumty, and develops professionally through self-assessment and collegial interac- 
tions with other members of the profession. Neither respect for students and their 
Ideas nor relationships with other teachers are addressed by the SSI-SM. This would 
require development of an additional task. 

The extent to wl ich the SSI-SM covers tne California Standards for Beginning 
Teachers is summanzed in Table 4.3. 



Job-relatedness 

«f li^^^^u ^^^u ^^^^^ their opinion of the assessment's job-relatedness. Fourteen 
?ocS ^^^f^^^ completed the evaluation feedback form felt that all of the major 
tasks were relevant; three teachers did not, and one did not respond. One teacher with 
a positive response stated: ^ 

Yes, everything [in the SSI-SM] determines how successful 
my teaching is. 

eScaniple ^^o^ed-'^^^^°"'^^'^ positively, but qualified their answers. One teacher, for 

Relevant yes, but [the SSI-SM tasks] miss the critically 
important areas of classroom management and control. 

^^V^\?^ teachers who did not feel all the major ta^ks were relevant, one 
teacher specifically criticized the emphasis on remediation of an individual student error 
on a smgie problem m Evaluating Student Perfonnance. The teacher indicated this task 
IS not realistic given the large class size in California Another teacher remarked, 

leaching is also a function of the students in your class." FWL staff interpret this 
teacner s comment as pointing out that the assessment does not capture a teacher's 
ability to tajlor a lesson to a particular group of students with which the teacher 
becomes mcreasmgly familiar over the school year. 

All the CT assessors felt strongly that new teachers need the skills and knowl- 
eage that are reflected m the assessment to perform competently as a new teacher. 



ERIC 



4.19 

88 



TABLE 4.3 



EXTENT OF COVERAGE BY THE SSI-SM OF 
CALIFORNIA STANDARDS FOR BEGINNING TEACHERS 



Standard 


Metho(J of Coverage 


Extent of 
Coverage 


22: Student Rapport and Classroom 
Environment 


-Not covered. 


None 


23: Curricular and Instructional 
Planning SMUs 


-Focus of two tasks. 


Partial 


24: Diverse and Appropriate Teaching 


-Covered by questions in two 
tasks. Breadth of content 
pedagogy and ability to 
design instruction taking 
students* ability and 
interests into account are 
major scoring components, 
though more questions 
should he added to fiiUy 
assess abilities in this area. 


Partial 


25: Student Motivation, Involvement, 
and Conduct 


-Covered by questions in two 
task? and tv/o elements of 
scoring indicators. 


Partial 


26: Presentation Skills 


-Not directly covered bv tasks 
questions, or indicators. 


X ol Uol 


27: Student Diagnosis, Achievement, 
and Evaluation 


-Partial focus of one task. 


Partial 


28: Cognitive Outcomes of Teaching 


-Not direcdy covered. 


None 


29: Affective Outcomes of Teaching 


-Not covered. 


None 


30: Capacity to Teach Crosscultuially 


-Not covered. 


None 


31: Readiness for Diverse 
Responsibilities 


-Not covered. 


None 


32: Professional Obligations 


-Not covered. 


None 



ERIC 



4.20 

8,9 



.^ppropnateness for Beghming Teachers 



The appropriateness of the SSI-SM content for beginning teachers was lo be 
evaluated in two ways: (1) the perceptions of SSI-SM teachers, assessors, and other 
observers and (2) the performance or teachers on the assessments. 

Ferceptions When asked whether the mathematical topics and concepts chosen 
for the assessment were appropriate for demonstrating their teaching skills, 16 of the 
teachers replied affirmatively. One teacher commented: 

Yes, the topics were basic enough so that even if you 
haven't taught the lesson, the topics were appropriate. 

Another teacher concurred: 

i haven't had to teach ratios/proportions, but probably w/i/ 
some day. 

One teacher, however, had exactly the opposite opinion He found the assess- 
ment to be inappropriate because he had not taught linear equations to his seventh 
graders in the depth that be felt was required to respond tj the questions 

Another teacher stated that the topics and concepts chosen for the assessment 
were fair, but that the assessment "failed to really challenge." 

The CT assessors also believed that the subject m.atter content and tasks were 
appropriate means of assessing new teachers. 

Eleven of the teacliers believed that they had sufficient opportunities to acquire 
the skills needed to respond to the tasks, six did net, and one was not sure. Of the 10 
teachers identified as being in their first year, six believed the tasks were appropriate 
for beginning teachers, two did not, and two gave qualified answers. 

One teacher acknowledged the relevance of the tasks, but did not feel that his 
edaca:ion had prepared him to perform the tasks competently. Another teachev im- 
plied that knowledge of how to adjust a lesson for a gifted or slow class depended upon 
experience teaching those types of students. 

Two second-year teachers indicated that they needed the second year of experi- 
ence to respond well to the questions. One teacher explained that if he had been 
ac':ed the same questions when he was a brand new teacher, his answers would proba- 
bly have been more idealistic and less likely to reflect "the reality of the school. 

All but one of the CT assessors felt that new teachers would have had the 
opportunity to acquire the skills and knowledge needed to respond to the assessment 
tasks. Two suggested that if many teachers were having a problem, then that would 
reflect inadequate preparation programs for mathematics teachers at the secondary 
level. 



4.21 

90 



The dissenting assessor thought that the tasks were anDroDriate but that some 

Some teachers could (pripr to the interview) not be at this 

These topics are usually taught in 
two different sernesters at the middle-school level Ratio and 
proportions might not be taught in the high school position 
or a first-year teacher. However, I think that both stimuli 
are appropriate. Student-teaching experience might provide 
background on these topics. 

staaec nfSliri^S=°'' ^^»f^°t The scoring of the tapes occurred in the later 
to fudee tr«n?,rni^^^^^^^^ '°f1i ^^^^ °" performance of the teachers can be used 
aniSs of thi^Ln?ii • °^ ^he,fl^ssment for beginning teachers. The results and 
analyses ot the scored interviews will be reported separately 

cnteria^hotevpJ^?h!f ^'^T h?!""^^ or no information about the tasks and scoring 
h^fS r u^®'"^^ ? P'^°^ ^^s^i"g be incomplete at best We wilfnot 

fnforSatL S n ± "^t'-^'' °^ '^"^ ^^en they have ad^ua?e 

Sin nP tiwi admimstration, about what is expected and how it 

^o^sf^F^ir^!"^^^^^^ eene"c features of the tasks, mSerials, questions, 

prooes ov,oniig wucxia aiiuuld be made pnor to future field tests. 

Appropriateness across Contexts 

Hiff.r,-nl®.f "^^'^ ^^^^ "^^^^^^^ the assessment is fair to teachers working in 

SVS"^asTonr^nn^^ '5" ^^"^ semi-struSimer- 

7nd'^M°t^tT\^T '^'^^^^ ^5^^^^"' ^^^'y*"g '^ont 'Xts (e.g., across grade levels 
se" dn1sT;,LS'^^^^^^ groups, and in different sHhonVcommunity 

'nhS i'rS? . H- ^ ^"'^ P'lf ''''' '"P°"^ Appropriateness across grade levels, 
subject areas !ind diverse student groups are discussed below. 

ci^darvSctelT^! ^^l^^ ™' ^^^^"^^"t was specifically tailored for sec- 
and r7tin nrnJ^l5 "^'-^^^J^^^CS' a particular focus on the topics of linear equations 

mert was aSonS^'"lf '"^h S ^^5 ^^^^^ers felt the ^sess- 

me.nt was appropnate across grade leve'.s and subject areas, two teachwrs had different 
perspectives. One teacher stated he did not feel L had enough el^erience teS 

nf^iLc^l°^^f•y^^'^^'■ "^^^ ^ viewpoint different from the majoHty felt that this type 
ment S^S^,^^^^^^^ appropriate for all subject areas, coiimenting, "The ass^ 

tin frscSSTsffiSil^^ subject-matter courses like mat? and science 

fnr thic^^f T^^^ *if.V.^ o{ various constrainis on the selection of the samole 

™^^iiS «if?i-' V ''''^^'"^ °^ '"'^^"^"^ "^^^^ the same geographical 

?enS? 52:^ c^.f If '^^"g/o»texts ware represented. Urban areas were ove^epre- 
Snf l?.hlc'?^' ""'^^ towns were not represented at all Moreover, at least 
jvo of the teachers taught inner-city students in a context where the high-ab-'it^ stu- 
dents were those who scored at the sixtieth percentile on achievement tests. Kone of 



4.22 



91 



the participating teachers, however, commented that the assessment was inappropriate 
for teachers in urban settings, of students of different abilify levels, or of any other 
diverse student groups. 

Two of the six CT assessors also believed that the assessment was fair to teach- 
ers of diverse student groups, with one citing candidates who mentioned "their personal 
experiences with ESL children and alternate school settings." The other four assessors, 
however, did not believe that ihe assessment yielded any information about the ability 
of a new teacher to work with diverse student groups. 



Fairness across Groups of Teachers 

All of the responding teachers felt that the assessment is fair aniong teachers of 
different gender and ethnic ^oups. It should be noted, however, thai only three minori- 
ty teachers were assessed. The CT assessors also believed the assessment to be fair 
across teacher groups. 



Appropriateness as a Method of Assessment 

As an assessment method, the strength of the semi-structured interview is that it 
can measure teachers' awareness of and reasoning about their cognitive strategies. 
Teachers can describe what they know and explain how and why they would apply their 
knowledge in a variety of situations. Unlike selected response formats, ranges of re- 
sponses and interpretations are possible and acceptable. However, semi-structured 
interviews share the challenges of all performance assessments ~ the standardization of 
tasks and (Questions, documentation of candidates' responses, and the aoplication of 
explicit, uniform evaluation criteria for assessing performance. 



Assessment Fonnat 



This section discusses the clarity, timing, and tasks of the SSI-SM, and summa- 
nzes the comments made by teachers concerning feedback on their performance. 
Eighteen of the 20 teachers who participated in the pilot test provided written input on 
the SSI-SM. This section is based on their comments as well as those received from the 
assessors and the observers from FWL and RMC 



Qari^ ctf Assessment 

Prior to the pilot test, teachers received a description of the assessment and the 
live tasks to be performed. Sixteen teachers reported that the written materials mailed 
to them were helpful; one did not find them helpful; and one qualified a positive re- 
sponse. Twelve teachers found the oral overview before the assessment to be helpful, 
while two teachers responded explicitly that it was not The remaining four teachers 
gave varying responses, ranging from ^'N/A" to "A little" to a comment that the over- 
view was repetitive. Suggestions for improving the orientation materials were given in 
the section on "Logistics. 



4.23 



92 



tMrher^mmL°t-H^L^!J^ ^^^'^f ^^9^^^ ^^at the directions were clear; the other 

FlSZn t .^k""' '^^^^"^ interview was easier because "you know what to 
cSL teach^??n,md St the questions and tasks understandable; three did not. 
wa^Sfffin,^t ^V^-^ (evidently, "Structuring a Unit") confus ng because it 

was difficult to view each topic as being on the same fevel." 

or content"nf°th?f.SIf"^''°" '^^^'^^'^ "^^'^ ^° "^^^^ changes in the procedures 
or content of the tasks to accommodate teachers. The use of orobes accordine to train 

lefcfert '° '"'^'^ ^° adequately adapfthe ta^^^^^^^^ 

alwav, 'J^c^''''"r^ ^^'■""u' '^.^ questions in th tasks and the scoring criteria did not 
o Tas SsS ^f^^'the administration 

eii?en4Tb?s^^^^^^^ i^trs.""' ^° ^^'"'^^^ ^""^ 

Timing 

each tas'kL'd^r?Hf;JS''tf '.^^^1^ f"T^ approximately the same length of time for 
Ssesfor. w?re tn -^-^^T ^-^ teachers were to move from one task to another. 
Assessors were asked to limit the time a teacher took to oreoare for a task a'thnnoh 

lenXVtSSrtakX^^^^^^ '° TlS'e wer?"n'dM/ua7d^^^^ 

tiom . tL.w f y.'^o^PJete a task, depending on the average length of explana- 

Slete a ?ask Sin^^'en'^ ^^^^"^T' °^ P''^^'"^ "^^^^^ tiacherslid not 
complete a task in the allotted time, often they were allowed a little a^ litional time 

ishS task%arXT^^^ ^° "^'^^ ^^^'^her T^ u ZlTdn- 

ished a task early, the coordination room was available to gather and take a break. 

not resDo^dTo'the m?ei?."n"''n^ H™"! ^^^isfactory; two diJ net; and one teacher did 
= 3»o°n^ ;b ^ ^-^^ - ha^aTeM ^"hen 

and sJHSS^Ti""^ °^ a''° "^^'^^ ^^h^'' investigation. Structuiing a Um't 

IssSes^^S. alf^ ^'l"^?'^ ""^ consuming than other tasks. 

t^k^rS^r'sSct^^rfoT^^^^^ ""^""^ ^ considered to see if alternative 

Nature of the Tasks 

.,-mniat?tS.®i?J"^-^' ^^^^^^^^^ respond to stimulus materials that are intended to 
Seal J,ateS"K' '^^f''%' "'^'""."y en'^ounter. Yet some of the tasks include 
seSeS card..^^LS2^^^^ Structuring a Uuit presents concepts and topics on 
ofTuDDLmentarl Sa^^T™¥ P'^'"'?^ ^''''^ ^ t^^^^ook, but no array 

AtS;^ matenals, such as the teacher's guide; and in AlternatSe Pedagogical 
^^r^^ide? K''^!^' r °"Jy ^ketchily described. California educators^ 
wish to consider whether the type and range of resources in the SSI-SM represent the 



4.24 



93 



materials that California teachers are encouraged to use in organizing and presenting 
mstruction m mathematics. s> s, f & 

Teadier Preferences about Feedback 

When asked what type of feedback they desired, six of the 18 teachers specifically 
mentioned mtormation on their own strengths and weaknesses as reflected by their 
responses to the SSI-SM. Three wanted information either about scoring criteria or 
What the assessment was looking for in specific questions. Two wanted information so 
tney could compare their own responses with those of others. Three teachers specified 
written teedback, two wntten or oral, and one wanted to watch the videotapes and then 
discuss his/her performance. 

As to who should proN-ide the feedback, two teachers suggested the interviewers, 
two others desired feedback from a committee, one teacher mentioned a master teach- 
er, and another teacher recommended someone other than the teacher's supervisor. 

Cost Analysis 

SSI Cost Estimates 

We can use the experience of administering the SSI-SM pilot tests as a basis to 
estimate the costs of implementing a semi-structured interview as a credential require- 
ment m California. To review, the SSI-SM was administered in a setting in which four 
teachers rotated among four assessors in a half-day interview. Thus, four assessors could 
administer eight half-day interviews in one day. Assessors also would need some time to 
prepare for the mterview and to summarize their notes and evaluations. The scoring 
system for the SSI-SM was not developed such that the interviewers could score and 
evaluate the interviews in the pilot test To minimize costs, it would be helpful to have 
the interviewers evaluate and score the teacher interviews. Assuming that the intei view- 
ers could conduct the interviews and score them by allowing an additional two hours for 
preparation and evaluation, it would take approximately five hours of assessor time to 
complete the interview and score a half-day mterview. Using the rate of $20/hour from 
our pilot testing yields the cost of $100/half day interview for the assessor time. 

.u nJI ^V^u ^h^s^"^'' training cost assumptions of $7/assessment that were used for 
the CU, and the other costs for phone, postage, etc. of $30/assessment, the interviews 
would require approximately $137/half-day interview for each teacher. 

Again, caution should be used in interpreting these figures since final costs will 
depend upon the actual reouirements of the assessment, and of the system within which 
tne interyieN^ ?-e placed. Furthermore, this analysis makes no assumptions about the 
manner m which the costs would be supported, e.g. charged to teacher candidates, 
supported by distnct or other staff, or supported by state agencies. 



4.25 

94 



Technical Quality 



Development 



theory ^/rlse^rcS 'p'll'iL' .^^^^^1°/^ °^ ^^^'^M has been based on both 
nf Slor.?. Ear y versions of the interview protocols were reviewed by erouos 

ment nn2T,-f.^ P^f I'^jP^P' The scoring system Is in the final stages of deX-^ 
Sts rthVirS?'" C9mpleted assessment package will be reviewed by^ 
Sas a whofe tnTn^'?^'? '^^'^'l^ measurement to evaluate the assess- 
Soaches mne^^^^^ '''?"eths and weaknesses in relation to alternative 

the^Jaste to alipS thPn?^?;^ ?^^"u8 ?^ ^"^^ ^^ages of development, revision of 
forthfr round oftSri^.Tr. ^l^^ely with the scoring criteria will -.cessitate at least one 
field testing ^ assessment can be considered ready for 



Relmbilit^ 



consistency of? S^er'^ the SSI-SM needs to be exaxTimed in several respects: (1) 
fSScroS tnSr/^?''^°'^^"^ ^^'^ (2) consistency of a teacher's Ver- 

coSte^cv Sn °^ ^ tasks; and (4) the internS 

fSl i-^ u *°?'"- Pilot test data for the SSI-SM orovide some initial 

aSSsu^VrS^^^^^^^^^ scoring sj^^m^ Appen^^^^^ 

JenLbfl,'^ t^f Sr^xT'^-^^'P^^y 5^ '^^^^ °" the internal consistency and interrater 
L sfaiel r^^^^^^^^ P^°* B^5ic^"y. the information on interrater reliabili- 

?onse^.i?r?t?nt' 2 ^Sr^.^'^^nt amone the raters on the initial independent ratings and 
consensus ratings after raters discussed and revised their ratings. The percent aeree- 

wlrf^^5hi?LfSt^^^^^^ °^ fo^o^4n^page. Ne'a?IMI^'^^^^^^ 

7hese^S^emZFrJ^., ^^l^Jf '^?"SS were exactly the sime. Improvements in 
traTninfof r" t h '^^^ ^ ^'^.^"^"^ ^"^^ ^"her development in scoring and 

JS fn an titenwew ^^'^ '"PP^"^ E^^^^^l^ ^0 achieve agreement among 

Idlers in an interview assessment such as the SSI-SM. 

forinati^ feedhUk lifatT^^ not sufficient to judge the degree to which reliable 
infill • ♦ be given within indicators, tasis or topics. However, the 

S s ?an n?SduS°.Sl?5 ^'l^ P^-°^^^^ evidence Lt i^eSts 1 ke 
tnis can produce reliable decisions about individual candidates. 

cases on"whTS;U?t£^''^^°P'?f"f.^^"^*"?.°^*h^ ^SI-SM and small numbers of 
SsJcLmeTriJ mfSc u ^^^^l^^Ie limit making any specific conclusions about its 
SafconSnc? tS agreements achieved among raters and the inter- 

nnt^STSh^ exhibited suggest that assessments using this approach have the 

caSate ReS^Jt J^'"'"'^ ""'-'f'^ *?.P'°^^^ ^^"^^^^ mformatfon on individual 
refinemint'c S ^ Potential would depend on further developments and 

refinements to the interview and scorer system and training. 



4.26 



.95 



TABLE 4.4 



•PERCENTAGE OF RATER EXACT AND ADJACENT AGREEMENT 
BY TASK AND TOPIC PAIRS 





Pairs 


Percent Exact 


Percent Adjacent 


Pair 1 




54 


90 


Pair 2 




52 


90 


Pairs 




58 


96 


Pair 4 




60 


98 



Pair 1 - Task 1 , Linear Equations 
Pair 2 - Task 3, Linear Equations 
Pair 3 - Task 2, Ratio & Proportions 
Pair 4 - Task 3, Ratio & Proportions 



*This table represents mean independent rater agreements across the five 
indicators, across the ten examinees. Exact agreement means that each 
rater assigned the same rating as their rating pair. Adjacent agreement 
means that eaoh rater was within one point of their rating pair on 
the assigned rating. Please note that the following analysis is tentative 
and has not as yet been verified by the SAS analysis. 



4.27 

98 



Validity 



v«l.-d.-tv^Sn'„h^'^'-"^ SSI-SM have been subjected to some forms of judgmental 
validity Uirough reviews by Connecticut mathematics educators. More content vllida- 

^^^^^^'^ differing assessment formats, is 
indicated. These investigations should also include various forms of empirical validity, 
such as whether the assessment discriminates between beginning and experienced 
teachers, and between beginning teachers who are identified as more or less effective by 
other means. •' 



Conchisions and Recommendatiocs 



This section contains conclusions and recommendations regarding the SSI-SM 
organized into the areas of administration, scoring, content, format, and a brief sura- 
mary. 



Admimstration of Assessment 



SSI-SM IS very labor intensive; at the present time, administration and scor- 
ing require one day per teacher. If on-line scoring is developed and found to be feasi- 
ble the overall time would he reduced slightly, but our experience with the CCI sug- 
gests that forming and documenting judgments takes a considerable amount of time. 

Factors which are key to smooth implementation of the SSI-SM include: 

o availability of appropriate facilities (which are often difficult to locate); 

o development of clear orientation materials for teachers, including de- 
scnptions of the tasks and scoring criteria; 

o organization of the assessment so ah tasks take approximately equal 
amounts of time and only a minimal number of transitions between 
tasks are needed; 

o if assessments are videotaped, arrangements for a technical consultant 
and famihanzation of assessors with the equipment; and 

o recruitment of assessors and scorers who are knowledgeable about 
mathematics, mathematics pedagogy, and student characteristics. 

The cost of administering this assessment could be reduced by substituting audio- 
taping for videotapmg as the form of documentation. 

Scoring 

The scoring system of the SSI-SM holds great potential for a muliidimensional 
assessment of teaching competency. Its strengths include broad applicability across 



4.28 

.97 



tasks, topicS; and teaching stvles and philosophies. Furthermore, the system's focus on 
three broad domains of teaching (Curriculum Content, Content Pedagogy, and Knowl- 
edge of Students) makes it likely that it will be suitable for semi-structured interviews in 
other subject areas. The latest pilot test of the scoring system suggests that despite 
reliance on professional judgment rather than checklists of observable behaviors, a high 
degree of interrater reliability can be obtained 

Before the SSI-SM scoring system is adopted, however, the following aspects 
need improvement: 

0 greater alignment between questions and indicators; and, 

0 redevelopment of indicators within the Content Pedagogy domain so 
that indicators are both comparable in scope and representative of all 
significant competencies falling within that domain. 

The feasibility of simultaneously scoring and administering the assessment should 
be explored, including the identification of problems in combining the two roles and 
possible effects on interaction between the mtcrviewer and the candidate. 



Assessment Content 

Our observations and information collected from assessors, scorers, and teachers 
participating in the pilot test suggest the following conclusions about content: 

0 The assessment content is in line with the philosophy of the Mathemat- 
ics Framework. Congruence of the tasks is good, though not complete, 
with respect to the areas of emphasis and characteristics of delivery of 
instruction, 

0 The two topics constituting the content of the SSI-SM did not reflect 
the diversity of topics in the secondary mathematics curriculum. 

o Coverage of the California Standards for Beginning Teachers is poor, 
but could be improved by refining current tasks and developing several 
new ones. Some standards that address teacher-student interaction or 
student outcomes could only be indirectly assessed. 

0 Questions and tasks seem to tap important teaching competencies 
which are needed by beginning teachers to teach effectively. 

o For the most part, the teachers considered the content to be fair with 
respect to assessing diverse groups of teachers from varying teaching 
contexts. 

0 The majority of teachers who participated in the assessment judged the 
tasks to be job-related. In many cases, however, teachers irdicated that 
they had not received instruction or training in p'^rforming these tasks. 



429 

98 



0 Knowledge of how to teach in a varietv of context' or tn div<.r«*. stu- 
dent populations was not assessed well'. ' 

° w ^^.^'^^J^S focal topics on teacher perform- 

Sf'n^tl?^ '7^^ teachers are issues which 

could not be explored with available pilot data. 

review bVa'ci^efJ^it^'^nH^l^?- development, it would benefit from a broad 

^fprfn^.no/oc ^^^}^ "5*'°"^' ^'^'^ mathematics, mathematics pedagogy, 
f^nc^^^r. fr''^^''h^l^ educauonal policy. California teachers and teacher^^* 
to^Sd^Z^i^f^'f^^ ? °^ f ^"^l- Such a review should be directed 
SemSEc . ^- reP^f sentativftness of tfie content with respect to a secondary 
fTnm n^^nSn.n'"'"'"?!.' 'Lc^rfe ^^"^^ ^^^^ conclusions^that Can be drawJ 
mpntSfhlh™ "i?k" ^^^^^ identifying potential weaknesses in the insmi- 
ment which could be remedied before incurring the expense of field testing 

Assessment Format 

contras]?d Sltl^Slc'^'* '"^^^^^ ^" ^ ^'s^^ssed at length in Chapter 8 and 

teachfr al^sme^ TuTn S?^^"'^*'^" multiple-choice examination methods of 
to Dian instnS «nH t?'^"f *° ^ ^^e assessment of a teacher's ability 

?aDDea™S1n .cc° "^^^^tan^ the sub ect at a conceptual level of understandL. 
c assroom '"^ ^ *° implement instfuction and manage the ^ 



Summaiy 



If semi-stmctured interviews are selected as a method of assessing new teachers 
for credentialmg purposes, the SSI-SM has high potenSa for serving as I DrotoWoe not 

testing of fhe reS^ cbser a ignment of the questions and the indicators, a^d pilot 
testing of the revised version before it can be considered ready for a field test. 



4.30 



CHAPTERS: 

SEMI-STRUCTURED INTERVIEW: ELEMENTARY MATHEMATICS 



ion 



CHAPTERS: 

SENQ-STRUCnTJRED INTERVI^ ELEMENTARY MATHEMATICS 



The Semi-Structured Interview in Elementary Mathematics (SSI-EM) was de- 
veloped by the Stanford Teacher Assessment Project (TAP). The version of the SSI-EM 
used in the Spring 1989 pilot test was not a final product, but rather a revised version of 
an assessment for certifying distinguished master teachers. Stanford had previously pilot 
tested an earlier version of the assessment with a sample consisting mostly of experi- 
enced teachers, but a few first-year teachers and student teachers had also been as- 
sessed. Based on this experience, the version used in the earlier pilot test was revised 
for use with beginning teachers. 

The assessment consists of a series of semi-structured interviews addressing four 
tasks. The candidate performs a task and then is interviewed. The four tasks are de- 
scribed below. 

(1) Lesson Plsnmng: A teacher receives 30 minutes to plan a lesson on a 
given topic for a fifth-grade class, and then responds to questions about 
that lesson. 

(2) Topic Sequencing: Using a set of 17 cards representing mathematical 
topics in a unit, a teacher sorts the cards into groups of topics, selects 
the cards representing the major themes of the unit, defines the topic 
on each card, and arranges eight of the cards in the order of perceived 
difficulty for students (least difficult to most difficult). 

(3) Lostnictional Vignettes: A teacher responds to a series of hypothetical 
situations involving students in after-school tutoring sessions. 

(4) Short Cuts: A teacher is presented with two purported computational 
shortcuts or rules of thumb for solving mathematical problems and 
evaluates them in terms of their pedagogical and mathematical sound- 
ness. 

This assessment closely resembles the SSI-SM (which was the subject of Chapter 
4). The SSI-EM and the SSI-SM are constructed in a similar manner in that they com- 
bine two assessment strategies, the semi-structured interview and the assessment center, 
which were described previously in the discussion of the SSI-SM. However, the set of 
tasks differ; those tasks which are most similar have slightly differing foci. 

The administration of the assessment, the assessment content, and the assessment 
format are discussed below. The discussion of the SSI-EM concludes with a summary of 
our evaluations of the SSI-EM*s potential as a prototype for further assessment devel- 
opment in elementary mathematics, as well as other areas of teacher performance. 



5.1 



ERIC 



'^'Oniinistration of Assessment 



k fniinS^^^^- with an overview of the administration of the assessment. It 

IS followed by a discussion of the logistics involved in arranging the pilot test. 



Overview 



The administration of the SSI-EM occurred between May 30 and June 24. 1989. 
^r!.' Si" °l interviewed, the majority of whom 

^^Z^ u 1' f '^'\"^»"onty teachers participated in the assessment Grade levels re- 
tKp«.^li?^tS?pStf''' also shown in Table 5.1. The table shows that while most of 
S^nlf T T^^u ^'^^^ Srade, nearly one fourth taught a combination of 

grades. Two teachers taught in a middle school. 

enr^ Ahhn?,ah'ln'! Collected on the length of the participants' prior teaching experi- 
fS'r . ^ all teachers participatixig m new teacher support projects were to be in 
on^L SSe^'' -S? ^^^'S^ " miscommunication resulted in^ the inclusion oi 

EM assessmfnr expenence in the initial administration of the SSi- 

PiPm.nT^fr?!-''^"* ^u""^ assessment were piloted. One focused exclusively on 
to^PnSwtiff"'' °f ^° t^sJ" (Lesson Planning and Topic 

Short r^ucYth/t f ""!? °" ^"'^ proportions, and two (Instructional Vignettes and 
Dated S nnn/°?'''^w t^^'ffl '^^ """^b" of teachers who partici- 

Gssi pLnn?na r.f/^'^. '''' -^^ 1°' J^^tfuctional Vignettes and Short Cuts: 25 for 
24 for Tnnfp fractions); 16 for Lesson Planning (ratio and proportions); 

p?oportTon% (elementary fractions) and 17 for Topic Sequencing (Vatio ani 

Logistics 

ni«tinn^^^ ''^^ activities for this assessment included (1) developing orientation infor- 

LnV. .-^-'^"* ^° ^^^^^^/A ^"'^ principals, (2) identi^ng teacher tamples, (3) idemify. 
hl arran^eSIe^tf ''^f''^''^ administrations, (5) marking kcility and 

ia n«^£ t 'J^^ gathenng the matenals to conduct the assessihem, and (7) afrang- 

deiS £t??^^^^^^ Logistical activities are ^ 

aescnbed m detail m the Administration Report for Spring, 1989. 

.onnp ?® SSI-SM, orientation materials sem to the teachers were limited in 

oc^K • jne tasks were identified and the structure of the assessment center activities 
reacHna^Lt'""^^ ^^^"^^ "^^^^edly from other assessmenS^ of 

?ft.tc ?», ? ^'u ?0''e.f?"iiljar to teachers, the quality of the orientation materials 
n^nrl nf .1, I ^-^^^ ^° anticipate activities and prepare for the assessment. The 
sion of ?he SSI SM Previously described in tfie fogistics section of the discus- 



5.2 

102 



TAQI C ft 1 

f 9 tMtekm i 



SEMI-STRUCTURED INTERVIEW IN ELEMENTARY MATHEMATICS: 
PILOT TEST PARTICIPANTS 

(Total Number of Teachers = 41 ) 



Descriptive Characterfstlcs 


Distributions of 


of Participants 


Participants 


Grada La vet 




4/5 


4 


5 


1fi 


5/6 


4 


6 


14 


6/7 


2 


Not Specified 


1 


Gender 




Ma!e 


10 


Female 


31 


Ethnicity 




Asian 


2 


Black 


2 


Hispanic 


1 


White 


34 


No Response 


2 


Location of Assessment 




Anaheim 




Fresno 




San Diego 




Ventura 





5.3 

.103 



o ? 1 scheduling were experienced, which should be kept in mind if 

a semi-structured interview is considered for statewide adoption for credentialing pur- 
poses Some teachers are on a year-round teaching schedule, complicating efforts to 
select convement times for assessments. Those year-round teachers who began school 
chedu1ef«TtI.°."ffi three nionths more experience than teachers on more traditional 
nroh^wv if. vt^ P^""* ""i?^ y^^'' O""" sense ^s that this difference in experience 
? ^^'u?^^ ^^^^^ °" performance on the SSI-EM; however, this should be inves- 
tigated for all teaching competencies assessed. . i mvc^ 

^ ^?**^u ^^I"^J^' ^9^^^^3 appropriate facilities with large numbers of small 
rooms proved to be chaUengmg. Scheduling assessment center activities at times of the 
year when schools and umversities are not in session would make these facilities avail- 
fni "^i^i f -I V® ^5'' problem. However, there may be costs associated with reopen- 
mg closed facilities (e.g., heat, custodial services). ^ 

.u^ 1, ^^--^ audiotapd with one tape recorder; the taping quality was 

checked at the beginning of each Tnterview. For three tapes, data were lost due to fail- 
ure to tape the interview at some point. This could be avoided by adopting a policy that 
once the tape recorder is started, tt is never turned off, even if it VecorSs sSbst^nti J 
"tJv- ^ ^f^'^l^^^P does not speak as they are working on a solution to a problem. 

nn V°Jlfnrl H Y.^ "^^^""^^ '^^^^^^^ ^Tie scorer, but another scorer rated 
it with no reported dif&culty. 



Security 

t-r-H .^^^S^^jy different versions of the assessment tasks had previously been adminis- 
tered to master teachers durmg the initial Stanford TAP pilot testing. Reports of the 
K^^Va'd 'ir^'^u^ contained the previous protocols had been distributed by the 
taken ^ ^ ^' '^^'■e^°'"e' only minimal security precautions were 

Assessors carried the interview protocols with them at all times or left them in 
securely locked rooms dunng assessments. Teacher notes were collected at the end of 
each tasx and disposed of at the end of each day. 

As with all assessments, security of the documentation of teacher performance 
and evaluations IS required. The extent to which a semi-structured interiiew needs 
redevelopment for each admmistration is unclear. The nature of a semi-structured 
ISh?,^^^ V^^^ security is compromised after each administration. While indi- 

^thf T f P«"a"y memorable, the tasks certainly are. One teacher who 

aught at the same school as a teacher who had taken the SSI-EM earlier in the week 
told us that the other teacher had described the experience. She believed that it was 
Piiff il *° .^r/ ^^°"t tl^e test content from someone who had previously 

taicen the test If semi-structured interviews are to be used for credentialing and thus 
become a high-stakes assessment, the contents will be quickly made public. 

Although teachers could ascertain the topics and the thrust of the questions, the 
degree to which pnor preparation would substantially compromise the validity of the test 
IS unclear. 1 he Stanford TAP advocates informing teachers of the for^l topic of the test 
m advance to allow teacliers to prepare. The questions in each task are interrelated. 



5.4 

104 



making memorization of appropriate responses difficult. Some questions and scoring 
cntena assess a relatively sophisticated knowledge of mathematics and mathematics 
pedagogy which would be difficult to acquire in a short period of time; others would be 
more susceptible to memorizing formulaic answers. 

The state of Georgia abandoned a semi-structured interview as a certification 
requirement when standardized answers were developed and taught to candidates, affect- 

ccT^w °L^^.^ interview data. Whether the level of performance demanded by 
tne bbl-EM IS sufficiently complex to inhibit the utility of similar strategies is f«n issue 
which needs to be explored pnor to its use for credentialing. 

For security purposes, topics, and perhaps tasks, should be varied across assess- 
ment dates. However, examples of questions and responses which indicate the level of 
performance required to pass, together with descriptions of the tasks, should be given as 
mtormation to candidates who are planning to participate in the assessment 

Assessors and Their Training 

Assessors for the SSI-EM need to be knowledgeable about mathematics and 
°J?,;hematics pedagogy at the elementary level. They must also have good interviewing 
Skills. tWL staff recruited seven California educators to administer the SSI-EM. All 
but two had taught elementary school. Two of these were retired administrators, and 
two others were elementary teachers on sabbaticals who had been working with a 
mathematics project at a local university. Of the two assessors with no elementary 
teaching expenence, one had designed elementary math curricula, and the other had a 
strong math background. In addition to these, when necessary, three FWL staff mem- 
bers also served as assessors to complete an assessment team. 

The seven recruited assessors, along with two staff members from FWL, received 
?<f'"'!i^in administration of the SSI-EM. Training sessions weie conducted on May 
It) and 19 by one of the original test developers, a representative from the Stanford 
leacher Assessment Project Each assessor was trained to administer two of the tasks, 
either Lesson Planning and Topic Sequencing or Instructional Vignettes and Short Cuts, 
ine trainmg consisted of (1) an overview of the assessment project, including its purpose 
and Its relation to the California New Teacher Project; (2) a general orientation to the 
purpose of the assessment; (3) an overview of each task; and (4) paired practice in 
admmistenng the ^,vo tasks. 

Assessors all felt that their training had been adequate, although two mentioned 
that they would have welcomed more discussion of the specific intent of the questions 
and skills bemg assessed FWL staff who administered tasks felt that it was difficult to 
construct effective probes when it was unclear as to what information was actually being 
sought through the questions. This was less a problem in the design of training than a 
result of the stage of development of the scoring system, which was due to be extensively 
revised and therefore was not described to the assessors. 

During this pilot test the assessors were allowed to ask probing questions at their 
own discretion. They were carefully instructed to use probes that aimed at clarification 
or expansion of a teacher's response or lack of response, and not to hint at a correct 
answer. Assessors were monitored during the training session to see if they were able to 



5.5 



.105 



adhere to this admonition. However, feedback from scorers indicates that there were 
instances when teachers were guided to the correct answer. 

Assessor opinions on how often assessors should be retrained if they did not 
administer the protocol at least monthly ranged from every three or four months to once 
a year, i wo assessor thought that retraining was unnecessary, but either a review or an 
update would be useful if an assessor had not administered the task in some time or if 
changes had been made in the instrument 



Teacher and Assessor Impressions of Administration 

«n. Teachere and assessors were pleased with most aspects of the arrangements. 
When asked whether the arrangements (e.g., scheduling, faciUties, distance to travel to 
assessment site, breaks and lunch, room arrangement) were reasonable, 30 of the 40 
teachers returning surveys responded affirmatively, four teachers negatively, and two 
teachers responded positively but identified particular aspects that were unsuitable. 
Nine teachers commented that the assessment should have been scheduled earlier in 
the school year. 



Scoring 

During the previous administration of the SSI-EM tasks by the Stanford TAP, 
separate sconng systems for each task were developed. For the CNTP pilot test, a TAP 
representative developed a new scoring system with more similarity across tasks for 
scoring the SSI-EM. The scoring scale was also changed from a six-point scale to a 
two-point passyfail scale, with "unable to score" being a third cption. Specific compo- 
nents of the teachers performance were rated. Ratings were generally holistic, though 
a few categones consisted of checklists of major points to be covered m the response 
10 pass, a teacher needed to pass more than two-thirds of the components, so passinp 
rates were set at 70% for Topic Sequencing and Instructional ^gnettcs, while lesson 
Flarnrng which had more components, required the teacher to pass at least 75% of the 
cc^iponents. The scormg system for Short Cuts differed from the format of the other 
tasJss in that the summative judgments for performance on each individual short cut and 
the ent"-e task were hohstic. Appendix B lists the scoring categories for each task. 

The scoring system emphasized certain aspects of teacher knowledge, including: 

0 knowing various ways to organize topics in the discipline; 

0 knowing what's difficult, easy, and important within a topic and why; 

0 knowing multiple ways to represent topics that make it easy for others 
to comprehend; and 

0 anticipating misconceptions and preconceptions that students have about 
the content 



5.6 

108 



The interview protocols were revised on the basis of the previous oilot test 
experience to cianiy the questions and eliminate questions which did not "seem to elicit 
useful mformation. The scoring criteria were adapted as weU. Criteria which were 
deemed to be too difficult for beginning teachers were dropped. The revised scoring 
criteria were also made more specific than the original criteria to resolve application 
problenas that had been reported by scorers. Some portions of the interview were 
excluded from scormg because they did not elicit useful information. 

Scoring Process 

Scorers listened to taped interviews and rated the teacher responses with the use 
ot task-specific categones. The number of categories per task varied fi om 14 for Topic 
bequencmg to 20 for Lesson Planmng. There were no categories which were rated for 
more than one task. A few categories were check lists, e.g., the teacher mentioned 
three out of four specified aspects of the topic which students find difficult Most 
required holistic judgments, e.g., the teacher's description of the strengths of a particu- 
lar short cut was satisfactory. If sufficient information was not available to address a 
scormg category, the category was omitted for that teacher. To determine a pass/fail 
score, with th? exception of Short Cuts, the number of categories recer/mg a passing 
score was divided by the total number of categories scored for each task and the per- 
centage of categones passed was calculated. Passing scores were set at 75% for Lsssan 
Plammig and 70% for Topic Sequendng and Instructioiial Vignettes by the TAP repre- 
sentative. The reason that the passing scores dififered between Lesson Planmng and the 
other tasks was that there were fewer scoring categories in the other tasks, so each 
mdividual category had more weight when computing the final score. For Short Cuts, a 
teacher s evaluation of each of two algorithmic short cuts was rated separately. First, 
various aspects of the teacher's response were rated on a pass/fail dimension to assist in 
arriving at a pass/fail score for the entire set of responses for the short cut The 
judgments for the two short cuts were compared. If they agreed, then the entire task 
received the common rating. If the two ratings differed, then the evidence was com- 
pared; If the weight of evidence did not clearly f upport an overall score (e.g., if the 
teacher received a marginal pass on one short cut and a marginal fail on the other), 
then the task was assigned the score of the first short cut, which was deemed to be 
more representative of a teacher's competence by the test developers. 

Discussion of Scoring System 

For the most part, the SSI-EM scoring system was not a check list of features in 
a teacher s response, but relied on professional judgments to evaluate teacher re- 
sponses. This increased the ability of the system to apply across differing teaching 
styles and approaches, but it increased the difficulty in training scorers to the same 
standard, especially since examples representing the total range of potential responses 
were not available. « r r 

The scoring system does not generalize across tasks or even across the same task 
focusing on different topics. It is also insensitive to differences in the quality of re- 
sponses; clearly superior responses receive the same credit as marguially acceptable 
ones. (This is not a problem for credentialing decisions, but limits its utility for profes- 
sional development purposes.) Although scorers generally felt that the score's for each 

5.7 

.107 



ERIC 



task reflected thetr summative evaluation of a teacher's performance, they also reported 
some occasions when they were unable to score what they considered to be signScant 
aspects of a teacher s response due to the lack of a relevant scoring category. In its 
choices of scoring categories, the scoring system contains implicit judgments about the 
uvTf.'il'l^^ particular aspects 9f a teacher's potential response. These choices are 
Ukeiy to be the subject of debate withm the community of mathematics educators. 
Because of the greater degree of specificity compared to the scoring system of the SSI- 
i>M, a professional consensus on scoring catcgones is likely to be more difficult to 



ccT cx:^^ ^-^ ^"i""."? ^y^^^^ SSI-SM, the after-the-fact development of the 

JM-bM scormg resu ted m a misalignment between questions and scoring criteria. 
While assessors for the SSI-EM were invited to probe for clarification, they were unable 
cO do so effectively because they did not know the scoring criteria. This resulted in 
numerous mstances where there was insufficient information to rate some specific 
components of the responses. 

Scorers and Their Training 

frr^rr. 1 TBcruited scorcrs from ths San Francisco Bay Area on the basis of referrals 
trom .ocal mathematics and science programs for teachers. The required qualifications 
tor scorers were expenence and training in mathematics education and/or required 
elementary education. Experience in conducting observations or evaluations of teachers 
was a desirable quahfication. Due to unavailability or lack of proximity to the training 
site, only one of the seven assessors participated in the scoring. Of the eight scorers, 
five were present or former elementary school teachers. Three scorers were teachers 
on sabbatical; one was a professor of mathematics education at a local university: two 
were math education consultants; and two were doctoral students with interests in math 
and science education. 

Each scorer was trained to score two of the tasks, and provisions were made for 
double sconng four to six tapes to serve as a rough reliability check. The training of 
scorers was conducted by the TAP representative and a FWL staff member; it consisted 
ot two phases. In the first phase, each sturer received one-half day training per task, 
ijcorers received the interview protocols in advance and became familiar with the proto- 
cols pnor to the trainmg. At the training session, scorers were given an overview of the 
California New Teacher Project and the SSI-EM. The scoring guide, which described 
each sconng category and how to rate it, was then handed out and explained in detail. 
Iwo transcnpts of previously scored interviews were provided, and the reasons for the 
scoring were explamed. Scorers then listened as a group to a tape and practiced scor- 
ing iL ihen scores were compared and questions about application of the scoring rules 
were answered by the trainer. All scorers for a given task were then given the same 
Jour tapes to score as a preliminary reliability check. They returned a week later for 
the second phase of sconng training, meeting for an hour and a half with the trainer to 
compare their sconng decisions and clarify how to apply the scoring criteria. 

At the end of the training, each scorer was given 18 or 19 tapes to score, includ- 
retura?d th?m t^^Jwi! ^ ^^^^^^ ^ '^^^ ^^^^^^ ^^P®^ 



5.8 

108 



This level of training did not prove to be adequate for most tasks. As a rough 
che''!f on reliability, pairs of scorers rated four tapes for Lesson PlaDmiig and Topic 
Scijocnang (however, one scorer found much of one tape for Lessee Plamrfng-Ratios 
inaudible) ana sa tapes for Instructional Vignettes and Short CatSL The percentage of 
agreements between raters on pass/fail judgments for each task was 50% for Lesson 
Planmnrfiractions, 67% for Lesson Flannint-Ratios, 100% for Topic Sequencing-Fiac- 
tons, 25% for Topic Sequendng-Ratics, 100% for Instructional vfenettcs, and 50% for 
Short Cuts. In many cases, the difference in rating particular scoring categories hinged 
on a single piece of evidence wliich one scorer had heard and the other had not 
Sometimes, ratings varied due to differences in interpretation of teacher comments. 
This was especially common for resp »nses deemed to be borderline. 

All but one of the scorers rated their training as very good, with the other scorer 
rating the training as adequate. Suggestions for improvement included more examples 
sjjanning the range of performances, more opportunities to compare and discuss ratings 
with the other scorers, and reorganizing the scoring guide for Instructional Vignettes to 
correspond with the scoring sheet Scorers generally believed that the scoring guidelines 
were complete and clear, but three scorers called for more detailed examples; they felt 
there was too little guidance for scoring answers which were on the borderline between 
acceptable and unacceptable. 



Teacher Preferences about Feedback 

Teachers were asked what types of feedback would be most useful and by whom, 
when, and in what format should the feedback be provided. Of the teachers emressing 
an opinion, the most popular response (by 18 of the 40 teachers) was that feedback 
should identify the strengths and weaknesses of a teacher. Eight teachers desired 
suggestions for improvement, three wanted a summary of all results and scores, thre'^ 
wishe- to know if they had given the correct answers, and three wanted the feedback to 
explain the purpose of the assessment and to identify evaluation criteria. 

Teachers also gave a range of answers about other questions related to feedbacL 
Of those specifying a time frame, seven said immediately (presumably by the interview- 
ers), and three said as soon as possible after the asses»merit. in terms of format, seven 
teachers preferred a written format, three said oral (in addition to the seven who said 
immediately, which presumably would be oral), and three specified either written or oral 
feedback. One teacher requested that the feedback be mailed to a teacher's home. In 
terms of who should provide the feedback, nine teachers requested an impartial party 
such as a testing service; seven selected the interviewers; four specified another educa- 
tor (such as a mentor teacher); and iwo felt that feedback should be channeled through 
the New Teacher Support Project 



Assessment G)ntent 



The SSI-EM was originally designed to assess master or expert teachers. As a 
result, the tasks emphasize both mathematical knowledge and state-of-the-art knowledge 
of how to teach elementary mathematics (pedagogical content knowledge). More gener- 
al pedagogical skills, such as classroom management, receive little attention. Thus, the 



5.9 



109 



fn^c^mii^teS "^^"^""""^ subject-mstter competency, and not more general teach- 

assessment was neither developed for beginning teachers nor to be congru- 
ent with Califormas curncular emphasis. However, an analysis of the appropriateness 
ot Its content for begmnmg teachers and its congruence with the emphases of the State 
mathematics cumculum guides and frameworks and California Stancfards for Beginning 
Teachers can suggest the form that an assessment of beginning teachers in elementary 
mathematics could take. o o j 

dimensiSis- evaluates content-related aspects of the SSI-EM along the following 

0 Congruence with Curriculum Guide and Framework emphases; 

0 Coverage of the California Standards for Beginning Teachers; 

0 Job-relatedness; 

0 Appropriateness for beginning teachers; 

0 Appropriateness across teaching contexts; 

0 Fairness across groups of teachers; and 

0 Appropriateness as a method of assessment. 

Congruence with emphases of the relevant curriculum guide is addressed first. 

Congruence with California Cumculum Guides and Frameworks 

thr^ 1985 Mathematjcs Framework for California Public Schools: Kindergarten 
rl^Z^. ? u ^"'e^^e '"Jentifies five major areas of emphasis: (1) problem solving, (2) 
r twi^i echnology (3) computational skills, (4) estimation and mental arithmetic and 

computers m mathematics education. In the SSI-EM, one or more tasks addressed 
the areas of calculator technology, computational skills and estimation. Problem solving 
was indirectly addressed, while mental arithmetic and computers in mathematics educa- 
tion were not addressed at all. The specific ways in which the areas were included in 
the assessment are described as follows. 

0 Although probtem solving was not a direct area of focus of the assess- 
ment, the _SSI-EM scoring criteria stress conceptual understanding of 
mathematical algorithms and terms, which facilitates the application of 
algorithms and concepts to new problems. Teachers also sometimes 
needed to provide mora than one approach to a mathematics problem 
to receive credit for iheir solution, which reflects the multiple strategies 
approach to teachmg problem solving that is stressed in the Mathemat- 
ics tramework. Two of the four vignettes in Instructional Vignettes 
could be treated as problem solving situations by the teachers, but they 
were not required to do so. 



5.10 



11.0 



0 All four situations in Instructional Vignettes involved a student's use of 
a calculator and resulting misconceptions. 

0 The evaluation of the teaching of computational skills by the SSI-EM is 
consistent with the Mathtmatics FramQworl^s emphasis on conceptual 
understanding of why algorithms work, 

0 One situation in Instructional Vignettes addresses estimation errors and 
their remediation. 

The Mathematics Framework also emphasizes the following characteristics in 
terms of delivery of instruction m mathematics: teaching for understanding, reinforce- 
ment of concepts and skills, problem solving, situational lessons, use of concrete materi- 
als, flexibility of instruction, corrective instruction/remediation, cooperative learning 
roups, mathematical language, and questioning and responding. The main theme of the 
SI-EM is teaching for understanding; every exercise has multiple scoring criteria which 
address this teaching characteristic. A major focus of Short Cuts is whether the teacher 
is promoting computational efficiency at the expense of conceptual skills. With the 
exception of cooperative learning groups, the remaining instructional characteristics are 
embedded in one or more tasks. Some tasks, however, could be modified slightly to 
strengthen measurement of the appropriate use of these techniques. Lesson Planning, 
for example, could be modified to address situational lessons by asking the teacher to 
explain why they either did or did not include this approach in the lesson. (The scoring 
would address the appropriateness of either use of a situation or the rationak for not 
using such an approach, not the use or nonuse of a situation.) 

The congruency of the SSI-EM with the emphases in the Mathematics Frame- 
work is summarized in Table 5.2. 



Extent of Coveragp of California Standards for Beginning Teachers 

The California Beginning Teacher Standards are criteria for teacher competence 
and performance which the Commission on Teacher Credentialing expects graduates of 
California teacher preparation programs to meet Listed below are brief descriptions of 
Standards 22 through 32 (with each standard following in italics). To evaluate this as- 
sessment instrument and make inferences about the assessment approach which it repre- 
sents in terms of the appropriateness for use with California elementary mathematics 
teachers, the SSI-EM tasks and scoring criteria were compared with the 11 California 
Beginning Teacher Standards. Although some of the questions in the SSI-EM task elic- 
ited information pertaining to a particular standard, the scoring criteria often failed to 
capitalize on this information. Tliis will be noted in the discussion of the standards 
where it occurs. Each standard will be dis<*ussed separately. 

Standard 22: Student Rapport and Oassroom Erryirorunent Each candidate 
establishes and sustains a level of student rapport and a classroom environment that 
promotes learning and equity, and that fosters mutual respect among the persons in a 
class. None of the content in the SSI-EM addresses this standard. Indeed, except for 
the "clearly stated expectations regarding student conduct'' (CTC, 1988: 23), the other 



5.11 



Ul 



TABLE 5.2 



COVERAGE OF THE CALIFORNIA 
MATHEMATICS FRAMEWORK BY THE SSI-EM 





Method of Copvenge 


Extent of 
CovBta^ 


Areas of Eniph««is: 






Problem Solving 


-In general, many scoring criteria address prerequisites 
for problem solving. Instructioual Vignettes focus 
rcmcuiauug soiaenc errors m soivmg problem^. 


Partial 


Calculator Technology 


-Instructional Vignettes focuses on student errors 
resulting from the use of a calculator. 


FuU 


Computational Skills 


-Tasks and scoring criteria emphasize underlidng a 
base of conceptual understanding for developing 
computational skills. 


FuU 


Estimation and Mental Arithmetic 


-One problem in Instrutional Vignettes focuses on 
estimation errors; mental arithmetic not addressed. 


Partial 


Computers la Mathematics 
Education 


-Not addressed. 


None 


Dcliveiy of jstiiiction: 






1 Teaching for Understanding 


-Implicit in all tasks and scoring criteria. 


FuU 


Reinforcement of Concepts 
and Skills 


—One scoring criterion of Ls^^nn Plnnviinff taclr 


ramai 


Problem Solving . 


-In general, many scoring criteria address prerequisites 
sui ^iLfuic:!!! auivui^. insirucuonai vignettes toe us 
on remediating student errors in problem solving. 


Partial 


Situational Lessons 


—Some viffnettfis flddri*<:4 c^fiinHnnc 


Partial 


Use of Concrete Materials 


* u«»o ttuuicoawu oy two scorms criteria 
for Lesson Planning. 


Partial 


Flexibility of Instruction 


-Scoring criteria for Lesson Planning and Shortcuts 
address multiple methods of presenting concepts 
or solving problems. 


Partial 


Corrective Ihstructi(^ Remediation 


-Instructional Vignettes and Lesson Planning address 
remedial instruction. 


FuU 


Cooperative Learning Groups 


-Not addressed. 


None 


Mathematical Language 


-Addressed by scoring criteria for each task. 


FuU 


Questioning and Responding 


-Focus of Instructional Vignettes. 


Partial 



5.12 



112 



factors such as rapport with students address teacher behavior when interact- 
ing with students. This would be difficult to simulate in an interview situation, except 
through vignettes or videotapes. 

Standard 23: Qmicular and Instzvctional Planning SkUJs. Each candidate pre- 
pares at least one unit plan and several lesson plans that include goals, objectives, 
strategies, activities, materials and assessment plans that are well defined and coordi- 
nated with each other. This is addressed in depth by two tasks in the SSI-EM: Lesson 
Plamnng and Topic Sequendng^ In Lesson Plaiming, the teacher is asked to plan a 
three lesson sequence, with the middle lesson described at length. Three of the scoring 
criteria, counting for approximately 15% of the total score, address the coordination of 
the lessons and the adequacy of the amount of practice devoted to two concepts taught 
in the lessons. Topic Sequencing requires the teacher to group mathematical topics 
according to how they should be taught The scoring criteria do not address the appro- 
priateness of the grouping, focusing instead on measurement of the teacher's under- 
standing of the selected mathematical topics and their interrelationship. This content 
knowledge is necessary to plan effective mstruction. 

Standard 24: Diverse and Appropriate Teaching. Each candidate prepares and 
uses instructional strategies, activities and materials that are appropriate for students 
with diverse needs, interests and learning sWles. This standard is addressed to some 
extent by Lesmn Plajoumig and Instructional Vignettes, though not in depth. In Lesson 
Plamdng; two of the scormg criteria, representing 10% of the total score, address 
the use of multiple representations of the content in presentations or responses to 
student questions; three of the scoring criteria, constituting 15% of the score, address 
prior student knowledge necessary to understand the concepts being taught One series 
of questions asking what v^ould cause deviation from the plan and how a teacher moni- 
tors student understanding was not scored. The teacher's responses to this section of 
the protocol would yield information about competencies addressed by this standard. 
Instructional Vignettes has four scoring criteria constituting 20% of the total score ^yhich 
evaluate the appropriateness of the teacher's understanding of student thinking in vi- 
gnettes that portray student misconceptions or confusions. However, the addition of 
questions and/cr probes addressing assumptions about the sources of student errors 
would facilitate scoring these criteria, which were often left unscored because of the lack 
of information to judge the appropriateness of the teacher's response. To fully address 
this standard would either require development of a new task or substantial revision of 
Lesson Planning or Instructional \^goettes. 

Standard 25: Student Motivation, Involvement and Conduct Each candidate 
motivates and sustains student interest, involvement and appropriate conduct equitably 
during a variety of class activities. Information about motivation and the involvement of 
students in the development of the lesson is elicited by Lesson Planning, though not in 
great depth. Lesson mmiing contains two scoring criteria constituting approximately 
10% of the total score that address motivation and involvement of students in the 
lesson. TThe appropriate use of reinforcement and feedback, setting high standards, 
equitable treatment of students, and discipline are not addressed by the SSI-EM, and 
would require the development of additional questions and/or a new task. 

Standard 26: Presentation SkUls. Each candidate communicates effectively by 
presenting ideas and instructions clearly and meaningfully to students. This standard 
addresses the linguistic complexity of a teacher's communications; three of the four 



5.13 



.1.13 



tasks in the SSI-EM address both this and conceptual aspects of a teacher's comraur;ica- 
tion with students. Lesson Planning has one criterion addressing the concept! ' -larity 
of the introduction to the lesson and two criteria that address the appropriateness of 
the teaser s response to a student error, accounting for approximately 15% of the total 
score. Topic Seqnencmg has three criteria directly addressing either the language used 
to explain concepts or the knowledge of common conceptual understand .igs of students. 
These criteria constitute 21% of Ihe total score. Instructional Vignettej^ has four crite- 
ria constituting approximately 20% of the total score which address the adev-uacy of 
teacher explanations, both in terms of the clarity of the communicadon and in terms of 
laying a foundation for mathematical concepts introduced later in the curriculum. It is 
not clear whether or not an interview would capture nonverbal communication by a 
teacher; several teachers mentioned that it was difficult to respond as if the interviewer 
were a student 

Standard 27: Student Diagaosis, Adiievement and Evaluation. Each candidate 
identiGes students' prior attainments, achieves signiGcant instructional objectives, and 
evaluates the achievements of the students in a class. One section in Lesson Planning, 
which was not scored, addressed the routine monitoring of levels of student achieve- 
ment during the lesson. Four scoring criteria in Instructional Vignettes, constituting 
approximately 20% of the total score, addressed the identification of student errors. 
Skills in constructing and interpreting summative forms of evaluation are not addressed. 

Standard 28: Cognitive Outcomes of Teaching. Each candidate improves the 
ability of students in a class to evaluate information, think analytivally, ana reach sound 
conclusions. Student outcomes are not directly addressed by the SSI-EM. However, 
many of the scoring criteria focus on whether the teacher is laying a cognitive founda- 
tion that enables the student to achieve understanding of mathematical concepts and 
their mterrelationship. For example, one of the criteria for scoring Topic Sequencing is 
whether the metaphors and analogies, if any, used to explain concepts facilitate or 
hinder understanding of the concepts. 

Standard 29: Afkctive Outcomes of Teaching Each candidate fosters positive 
student attitudes toward the subjects learned, the students themselves, and their capaci- 
ty to become independent learners. The encouragement of positive interaction among 
students ard the provision for independent learning experiences are not addressed by 
any task in the SSl-EM. Student motivation was discussed under Standard 25. 

Standard 30: Capacity to Teadh Qoss-cultmalfy: Each candidat(. demonstrates 
compatibility with, and ability to teach, students who are different from the candidate. 
The differences between students and the candidate should include ethnic, cultural, 
gander, linguistic and socioeconomic differences. This standard is not addressed by the 
SSI-EM; to do so would require adaptation of the tasks or development of new tasks. 
This is further discussed in a subsection on the appropriateness of the SSI-EM for as- 
sessing teachers who teach diverse students. 

Standard 31: Readnaess fbr Diverse RespmsHnUties. Each candidate teaches 
students of diverse ages and abilities, and assumes the responsibilities of full-time 
teachers. This standard refers to student teaching experience, although it could be 
interpreted to apply to the ability of teachers to accept teaching assignments that span 
the elementary grades. The SSI-EM concentrates on a single grade level; it could be 
constructed, however, so eveiy task would address a different topic at a different grade 



5.14 



iJ4 



level. If this approach were taken, seme of the scoring criteria in the SSI-EM, such as 
identifying whaVstudents find difficult about a topic, would become problematic because 
teachers who have taught the topic would be advantaged relative to those who have 
not 

Standard 32: Pro&sskmal ObSgatkms. Each candidate adheres to high stand- 
ards of professional conduct, coo[)erates effectively with other adults in the school 
communit}'. and develops professionally through self-assessment and collegial interac- 
tions with other members of the profession. The SSI-EM does not address this stand- 
ard Since the SSI-EM focuses on content pedagogy, any task constructed to measure 
this standard would be qualitatively different firom the other tasks, all of which focus on 
the teaching of elementary mathematics. 

The extent to which the SSI-EM covers the California Standards for Beginning 
Teachers is summarized in Table 5.3. 



Job-relatedness 

Teachers, assessors, and scorers were asked their opinion of the assessment's job- 
relatedness. Thirty-one of the 40 teachers agreed that **all the major tasks (i.e.. Lesson 
Planning, Topic Sequencing, Instructional Vignettes, and Short Cuts) composing this 
assessment are relevant to their job of teachmg"; eight felt they were not Several 
teachers commented that their students asked them some of the same questions con- 
tained in InstiiictKmal X^gnettes and Short Cuts. Five teachers singled out Instructianal 
"^gnettes as being irrelevant, and four each identified Topic Sequencing and Short Cuts 
as irrelevant One of these teachers identified both Instructional ^^goettes and Short 
Cuts as irrelevant 

The teachers expressed a varietj' of reasons for judging some of the SSI-EM 
content to be irrelevant One teacher was not sure that topics need to be taught in 
sequence; another observed that texts sequence topics; another objected to the focus on 
methods and mathematical reasoning in Short Cuts; and another felt that calculators 
received too much focus in Instructional Vignettes compared to their representation in 
the curriculum. More than 75% of the new teachers considered the SSI-EM to be job- 
relevant, however. 

Most of the assessors and scorers tended to feel that the SSI-EM tasks reflected 
a teacher^s responsibilities in the classroom. The one task that some assessors did not 
feel was related to a new teacher's experiences was Tonic Sequencing since this is 
prescribed by textbooks^ However, assessors felt that Topic Sequencmg reflected a 
perspective on instructional design that would be desirable for a teacher to have. 
Scorers of Topic Sequencing believed that elementary teachers, and especially fifth 
grade teachers, need to be able to sequence topics to effectively plan instruction. 



.^)propriateness for Begmning Teachers 

The degree of appropriateness for beginning teachers was judged from two kinds 
of evidence: (l) the percep^tions of teachers, assessors and scorers, and (2) the 
performance of the teachers on the assessment tasks. 



5.15 



115 



TABLE 5.3 



EXTkjNT OF COVERAGE BY THE SSI-EM OF 
CALIFORNIA STANDARDS FOR BEGINNING TEACHERS 



Standard 


Metitod of Coverage 


Bxtentof 
Coverage 


22: Student Rapport and Classroom 
Environment 


-Not covered. 


None 


23: Curricular and Instructional 
Planning Skills 


-Addressed in depth by Lesson 
Planning and Tooic Seauencine 


Full 


24: Diverse and Appropriate Teaching 


-Partially addressed by Lesson 
Planning and Instructional 
Vignettes, though not in depth. 


Partial 


25: Student Motivation, Involvement, 
and Conduct 


-Ability to motivate and involve 

students assessed in Lesson 

Planning and scored by two 
criteria. 


Partial 


26: Presentation Skills 


-Scoring criteria for three tasks 

that address conceptual clarity 
and appropriateness of teacher 
explanations. 


Partial 


27: Student Diagnosis, Achievement, 
and Evaluation 


-Focus of one task. 


Partial 


zo. L-ogmave O'itcomes of Teaching 


-Not directly covered. 


None 


29: Affective Outcomes of Teaching 


-Not covered. 


None 


30: Capacity to Teach Crossculturally 


-Not covered. 


None 


31: Readiness for Diverse 
Responsibilities 


-Not covered. 


None 


32: Professional Obligations 


-Not covered. 


None 


5.16 



lie 



a cci P®*"**PtioDS' For the most part, teachers felt that, as new teachers, they had 
sufficient opportunity to acquire the knowledge and abilities needed to respond in a 
reasonable manner to the assessment questions and tasks," with 28 of the 40 teachers 
marking Jyes, nine marking "no," and one giving a qualified answer. However, 15 of 
the teachers felt the Topic Sequencing task was too difficult, and three teachers each 
identified Instructional VigDettes and Short Cuts as too difficult. Seven other teachers 
specifically criticized the content of the assessment as being too difficult, identifying 
different aspects such as the lack of supplementary materials or questions which asked 
them to explain why fractions are useful in real life or how to teach material they had 
not previously taught As will be seen in the discussion of the teachers' performance, 
teachers generally exhibited gaps in content knowledge which might make it difficult to 
evaluate supplementary materials or identify applications of mathematical concepts. 
Descnbing how to trach unfamiliar material is difficult for beginning teachers, but the 
focal topics of the SSI-EM were all part of the elementary curriculum covered by the 
multiple subjects credential. Including topics previously taught would afford teachers 
maximal opportunity to draw upon their teaching experience as well as their knowledge 
ot mathematics pedagogy. However, since the credential covers a broad range of grade 
evels. It IS inevitable that an assessment consisting of an adequate sample of grade 
levels covered by the credential would include some topics which a bemnning teacher 
had not taught t»- o 

Two teachers suggested that assessment be delayed until the second year, echoing 
comments by second-year teachers that they were glad they had one full year of experi- 
ence pnor to the assessment because they would not have done as well had they been 
administered th^ assessment in their first year. Such feelings are articulated well in the 
lollowmg comment: 

/ thin;: it would be extremely difficult for a beginning teacher 
to demonstrate a competent understanding. V^e first year 
should entail in-seivices or further professional instruction by 
peers, etc. Assessment should be during the second year. 

Another teacher felt that questions about curriculum (which were not scored) were 
unfair: 

New teachers are not generally aware of the abilities or 
curriculum for any particular grade level. They learn this 
after they are hired. 

Despite teachers' perceptions of adequate preparation, many teachers had diffi- 
culty descnbmg mathematical concepts. Teachers were often at a loss when asked to 
provide a mathematical justification for a solution to a problem. This was true even for 
some teachers who had just correctly explained how to work the problem. As one 
assessor commented: 

Instructional Vignettes seemed about right in content. 
However, while many teachers could describe the steps they 
would take in teaching or solving the problem, they had 
trouble naming the concepts. 



5.17 

iJ7 



The two assessors who were classroom teachers felt that the assessment was too 
focused on mathematical sophistication and on content that was relatively difficult for 
beginning teachers. For teachers who were struggling with the content, it was difficult to 
display their skills in pedagogy and pedagogical content knowledge. Most teacher 
preparation programs do not require extensive courses in math methods; often a single 
course covermg grades K-8 is all that is required People who choose elemsntary 
teaching as a career are not required to have an extensive background in mathematics, 
and they niay take several years to be comfortable with content that is included in the 
fifth- or sixth-grade mathematics curriculum. However, if the elementary mathematics 
curriculum is to be upgraded in line with the expectations of the Mathematics Frame- 
work, teachers will need a more sophisticated understanding of mathematics very similar 
to that required by the SSI-EM. 

Scorers generally thought that the SSI-EM is a good prototype for assessing new 
teachers in the area of elementary mathematics, but identified shortcomings in specific 
areas where they believed that too much was expected of new teachers. These areas 
included: (1) the complexity of the evaluation of one of the Short Cuts (which did not 
work for a specific group of numbers) within the limited time period provided; (2) 
asking beginning teachers to depart from textbook orderings of lessons, which requires a 
great deal of professional self-assurance; (3) seeing the limitations of "short cut" algo- 
rithms, which depends on familiarity with student error patterns; and (4) ranking a set 
of topics in terms of student difficulty when students find most of the topics difficult 

Perfonnance on assessment tasks. Performances on the specific tasks indicate 
that the majority of the teachers were ill prepared to adequately respond to the ques- 
tions in the SSI-EM. While this was partly due to the original focus on the identification 
of exemplary master teachers, it was also due partly to weak content knowledge. 

Table 5.4 shows the performance of the teachers on the tasks. Not all teachers 
are included in the table. Five of the 164 tapes could not be scored due to a failure to 
record the entire interview. 

Although teachers felt most comfortable with Lesson Planning, they did not tend 
to do well on it This may have been because the teacher's plan was evaluated on the 
basis of its capacity to foster conceptual understandings among students and not accord- 
ing to characteristics of its format Criteria which most teachers were unable to meet 
included: 

0 communicating to students when the process of factoring was complete 
(i.e., how to mow when you have found the answer); 

0 providing an adequate amount of practice for factoring (a concept 
which students find very difficult, but which is critical for that particular 
lesson); and 

0 explaining why there can be percentages greater than 100. 

Although some of the teachers presented good application problems at the beginning of 
the lesson to capture student interest, almost half of the lessons on fractions were judged 
to be inadequate in motivating students. Most teachers did do well on: 



5.18 

1J8 



TABLE 5.4 



SEMI-STRUCTURED INTERVIEW IN ELEMENTARY MATHEMATICS: 

SCORING RESULTS 



Task 


No. of 
Teacliers* 


No. Passing 


No. Failing 


% Passing 


Lesson Planning 
Fractions 


25 


12 


13 


48% 


Lesson Planning 
Ratios 


15 


7 


8 


47% 


Topic Sequencing 
Fractions 


23 


5 


18 


22% 


Topic Sequencing 
Ratios 


17 


7 


10 


41% 


Instructional 
Vignettes 


38 


24 


14 


63% 


Short Cuts 


40 


22 


18 


55% 



Number of Tasks Passed* 




0 


12 3 4 


Number of 






Teachers 


4 


9 12 6 5 



♦Due to five failures to record the entire interview, the number of teachers does not 
always total to 41. The nunider of tapes affected differed by task. 

5.19 



ii.9 



0 activehr involving the students in the lesson (e.g., asking students Ques- 
tions during the lesson); ^ 

° lesson^"'^ procedures and conceptual understanding during the 

0 keeping a smooth instructional flow between the sequence of three 
lessons described; 

0 providing a clear introduction to the lesson; and 

0 for the lesson on fractions, providing more than one representation of 
the content 

H^fir,-*-'^^!'^^^" had great difficulty in explaining the topics in Topic SequendnE If 
definition by example had not been allowed, tk scores woufd have been e^S 
wmie a few of the topics were not clear from the title on the cards (e.g., fractions as a 

cS±2' °^ "-"^ f IT"" ^^'^ '^^"'^^P^ °^ applications from tife ^^ade 

, ,1 °^ ^ categories did not seem to both scorers and FWL staff 

^d^c^n':SSl^°V}t ^""-'''"^ ^^^^ Additional probing or asMng 

desSed ^^"^ ^^^'^^^^ ^ '^^^^'^^ '^^"^ °^ ^ of ^^sponse 

sion of topfcs-^^'^^^" ^'^"^^'^ because they did not do the following in their discus- 

0 perceive the importance of cross-multiplication in a unit on fractions; 

0 perceive the importance of finding the percent of a number in a unit on 
ratios and proportions; 

0 provide an explanation of why common denominators are needed to 
add and subtract fractions; and 

0 defend their choice to add or delete topics. 

In many instances, a concept which is clearly antecedent to another was placed well 
^^If ^ ordenng The teachers did best at identifying on? or two of the 

most difficult topics to teach and explaining why students found them difficult 

The teachers performed best on Instructional Vignettes, which required them to 

Sf r^ir.n/»^S!*^i?"?"?°" '^'^^"^ "^^^"8 ^"^'^ s°J^"g Problems with the use 
ot calculators. This task was tne only one m which any of the teachers received perfect 

rnH?AnTnL-°^ "^T^ Teachers generally could identify the student error 

and appropnately discuss the error with the student Some teachers' spontaneous 
explanations were not only conceptually appropriate and creative but extremely clear 
and concise, usmg metaphors, examples or analogies which would appeal to fifth-grade 
students. One example is the teacher who explained why students need to ler^m ^long 
ways to compare the value of two fractions instead of just using a calculator: 



5.20 



120 



You always need to leam the long way before you leam 
short cuts. It's like you leam the way to school and once you 
know the way to school, then you can cut across that dirt 
path, but if you cut across that path Grst, then you're not 
going to know where you're going to end up, so you have to 
know the long way and then you can make up your own 
short cuts. 

Over one-third of the teachers, however, could not figure out how to convert a recipe 
for six people to a recipe for eight people, as required by one of the vignettes, and as a 
consequence were unable to explain the problem to their students. 

Teachers also had difficulty with key elements of Short Cuts, which required them 
to evaluate two algorithms for simplifying either the reduction or comparison of frac- 
tions. In discussing each short cut, teachers had difficulty with the following: 

0 identifying limitations; 

0 justifying whether or not they would teach it; 

0 describing ways to facilitate teaching it; 

0 providing a mathematical rationale for why it does or does not work; 
and 

0 identifying whether or not it works for all fractions. 

One short cut was criticized by the teachers, scorers, and FWL staff for the complexity 
of the reasoning required to figure out whether or not it works. It was too complex to 
comprehend in a short period of time. However, teachers also had difficulty with the 
other short cut, which dM work because of the identity principle. Teachers should be 
familiar with this concept, if not its name, when they teach math to intermediate stu- 
dents. 

If the beginning teachers who participated in the SSI-EM are typical of recent 
graduates fi-om teacher preparation programs (and there is no apparent reason to be- 
lieve that they are not), then it would seem that beginning teachers are not equipped 
with an understanding of either mathematics or mathematics pedagogy sufficient to 
perform well on an assessment modeled after the SSI-EM. The low levels of perform- 
ance are probably due partially to the original focus of SSI-EM on assessing master 
teachers. However, many of the assessors, scorers and FWL staff felt that the level of 
content knowledge exhibited by many teachers compromises their ability to teach 
mathematics to elementary students. Other work has found that teachers who can solve 
mathematics problems do not necessarily have an understanding of the underlying 
mathematical concepts and relationships (Leinhardt and Smith, 1985). This knowledge 
is needed for competent design of instruction. 

It is often said that teachers, especially elementary teachers, do not need to be 
specialists in mathematics. Research on teaching of elementary mathematics finds that 
the quality of the developmental portion of a lesson differs considerably between effec- 
tive and ineffective teachers (Good and Grouws, 1975). This is defined as: 



5.21 



i2l 



lijc dcvclopmenta} portion of a mathematics period is that 

of a /e55on devoted to increasing comprehension of 
skiUs, concepts, and other facets of the mathematics curricu- 
lum. For example, in the area of skill development, instruc- 
tion focused on why an algorithm works, how certain skills 
are inttnelated, what properties are characteristic of a given 
skill, and means of estimating correct answers should be 
considered part of developmental work. In the area of 
concept development, developmental activities would include 
initial instruction designed to help children distinguish the 
given concept from other concepts. Also included would be 
the associating of a label with a given concept Attempts to 
extend ideas and facilitate transfer of ideas are a part of 
developmental work (p. 114). 

It is precisely these skills which most of the participating teachers lacked. Fur- 
thermore, these skills, which go considerably beyond the ability to work the problems in 
Uie textbook, do not necessarily develop fully with experience. The Mathematics 
frameworlr takes the position that current teachers of elementary mathematics need an 
understanding of mathematical concepts and their interrelationships in order to make 
ettective mstrucHonal decisions. While there are many instructional strategies that 
enable most students to work the problems in the textbook, some choices are superior 
to others in facihtating both mathematics instruction later in the curriculum and 
mathematical applications in daily life. The Mathematics Framework acknowledges that 
this goal requires a more rigorous preparation in mathematics education than most 
elementary teachers currently receive. It is quite Ukely that achieving the curriculum 
§2?^w Mathematics Framework mil require performances on the level of the 



Regardless of whether or not an assessment such as the SSI-EM is adopted, the 
pertormances of the teachers add credence to suggestions (e.g., Lampert, 1988) that 
teacher preparation programs need to strengthen their instruction in mathematics and 
mathematics pedagogy. Such strengthening cannot occur by requiring additional courses 
which concentrate on problem-solving algorithms. Instead, the additional preparation 
must focus on teaching the concepts and principles that underiie problem-solving algo- 
rithms. One Cahfomia university has developed a four-week summer workshop which 
teachers attend to gain the skills necessary to implement the Mathematics Framework. 
teachers increase then- knowledge of elementary mathematics topics, and learn problem 
solvmg and group activities, which helps to equip them to implement the Mathematics 
tramework. However, the same institution's mathematics methods course for elemen- 
tary teachers has not incorporated similar instruction because of time constraints witliin 
the cuixent course. The course cannot be lengthened due to competing priorities within 
the year-long multiple subject credential program. 

oo/5?2P™™ °^ beginning and experienced teachers. The four tasks comprising 
the .bbl-EM were part of a larger set of tasks that was initially administered to teachers 
with vannng amounts of expenence by the Stanford TAP. In this initial administration, 
six teachers had more than 10 years of teaching experience, while seven had two years 
or less, allowing comparisons between beginning and experienced teachers. All tasks 



5.22 



122 



were scored on three dimensions: Command of Subject Matter, Content-Specific 
Pedagogy, and Pedagogical Sensitivity and Responsiveness to Students. New and expe- 
nenced teachers differed least on the Command of Subject Matter dimension, which 
measured knowledge of mathematics as a discipline, i.e., its structure, boundaries and 
substance. Weak differences between the two groups were found for the Content-Spe- 
cific Pedagogy dimension, which evaluated the ability to present mathematical knowl- 
edge in a way that facilitates student learning. Strong differences were found between 
more and less experienced teachers on the Pedagogical Sensitivity and Responsiveness 
to Students dimension. Here the assessoio exammed descriptions of teacher-student 
interactions, including engaging students, providing appropnate feedback, and establish- 
ing interpersonal relationships with students. Instances in which scorers felt there was 
not enough information to reliably rate a particular criterion were markedly more 
frequent for novices than for experienced teachers, particularly within the dimension of 
Pedagogical Sensitivity and Responsiveness to Students. 



Appropriateness across Contexts 

FWL also evaluated the appropriateness of each assessment across varying teach- 
ing contexts. Twenty of the 40 teache.s providing comments on the SSI-EM did not 
feel it was a good measure of teaching ability across grade levels and subject areas and 
across different student groups and in different school/community settings. (Ten teach- 
ers felt it was; four teachers gave a qualified, "yes" answer; and six had no response.) 
Most of the criticisms related to gradft-level/subject-matter differences and diverse 
students. 

Grade level and subject matter. Regarding the appropriateness of the SSI-EM 
for use m credentialing, several teachers were concerned about the inexperience of new 
teachers with the specific content of this instrument Although this assessment was de- 
signed for and administered to fifth- and sixth-grade teachers, some of the teachers 
commented that they had nc: yet taught the material on which they were being as- 
sessed. Commented one teacher, "A new teacher is best able to talk about that area 
on which she spends time." Another teacher remarked, "Fractions is a hard concept to 
teach or think about how you'd teach it if you haven't already tried." One scorer who 
was a math education professor believed that some competencies, e.g., the ability to 
identify aspects of a lesson that were difficult for students, depended strongly on experi- 
ence teaching that lesson. This theme of experience extended into other areas as well. 
Teachers in one district had used the same textbook that was used in the assessment; 
several of these teachers indicated that their familiarity with the text and the way it was 
organized helped them in the Lesson Planning task. 

Teachers who had taught the topic were undoubtedly advantaged in drawing upon 
their experience to answer some questions, such as areas of student difficulty and activi- 
ties that motivated students. However, these areas did not constitute the majority of the 
scoring criteria. Teacbi: iS could pass each task in the assessment if they could clearly 
explain mathematical concepts which are fundamental parts of the upper-elementary 
school curriculum, and determine which skills were necessary to learn specific mathemat- 
ical concepts. This ability does not depend on experience teaching the topic, but upon 
famiharity with the topic. All teachers with multiple subject credentials are likely to 
teach these concepts, so measuring their level of content and pedagogical knowledge for 
topics which they have not taught does not seem to be unreasonable. 



5.23 



123 



The influence of experience in teaching particular concepts is an issue that ex- 
tends beyond the grade level at which the concepts are ordinanly taught. There was 
much concern among the assessors abcu'c the suitability of this assessment for primary- 
grade teachers, and concurrent interest in whether there v ould be two versions of the 
assessment for primary and upper-elementary teachers. "iTie SSI-EM has a narrow 
focus, and any future assessment for the multiple subject credential would need to be 
more balanced in representing concepts across the whole elementary mathematics cur- 
riculum. To design an assessment in a single subject area, such as mathematics, which 
is suitable for teachers at all levels represented by the multiple subject teaching creden- 
tial is challenging. However, as the name of the credential signifies, elementary teach- 
ers teach many subjects, which further compounds the difficulty of assessment design. 

Diverse students. Some teachers felt that the assessment did not take into 
account teaching diflfering student populations. One teacher, for example, expressed the 
belief that an assessment would not be appropriate if the questions were not geared 
specifically to the types of students (e.g., low-ability, LEP) that a teacher has been teach- 
ing. Assessors reported that teachers who had taught low ability students (who weie 
consequently at eariier points in the mathematics curriculum than the focal topics of the 
SSI-EM) seemed to have difficulty in drawing on their experience in answering the 
interview questions. 

The SSI-EM needs to be improved in of assessing a teacher's ability to work with 
diverse students, either heterogeneous classes or specialized student populations. Since 
these types of classrooms are increasing, the revision of the SSI-EM to address this issue 
is not just an issue of fairness to teachers in differing contexts. It is also a matter of 
including all important teaching skills. Within a semi-structured interview format, one 
way to address this issue wouldf be to construct vignettes describing children or class- 
rooms with particular characteristics, and ask teachers to descr.. e how they would con- 
struct a particular activity in the specified context. The vignettes would need to be ca - 
fully constructed to avoid stereotyping particular groups. 



Fairness across Groups of Teachers 

Over half of the teachers felt that the assessment was fair to new teachers of 
both genders, different ethnic groups, different language groups, and other groups of 
new teachers (26 of the 40 teachers agreed, nine disagreed," one gave a qualified an- 
swer). Of those who disagreed, the only reason given by more than one respondent 
'VP£ suggested by four teachers who felt that teachers from different linguistic groups 
would be disadvantaged because of the verbal skills required by the interview tormat. 

None of the participating teachers was limited in English proficiency, though 
there was at least one teacher of a bilingual classroom. Not surprisingly, then, none of 
the assessors or scorers mentioned fairness to teachers of differing linguistic ability as a 
concern. However, there was some informal discussion of whether highly verbal teach- 
ers have an advantage over less verbal teachers. 

The concerns related to fairness for different groups of teachers that were men- 
tioned by the assessors on their feedback forms were: fairness across age groups, tair- 
ness to minority teachers, and fairness to teachers for whom mathematics was not a 



5.24 



124 



strength. During informal discussions, one group of assessors concluded that both very 
young and very old teachers seemed to have difficulty with the assessment The young 
teachers seemed to be especially nervous and anxious about their performance. The 
older teachers usually were coming to teaching after a long absence from formal school- 
ing, and seemed to have special difficulties with the terminology of the questions. The 
number of teachers at the extremes was very small, and their mdividuai scores varied 
considerably, so no firm conclusions can be drawn. 

In terms of the number of SSI-EM tasks that were passed, the six minority teach- 
er were not statistically different firom the 34 non-minority teachers, but the samples 
sizes were too small to warrant making firm conclusions. Assessors at one site reported 
concern about the performance of one minority teacher in particular, which raised ques- 
tions in their minds about the assessment They felt that this particular teacher showed 
great commitment to and potential for motivating inner-city students. However, the 
teacher's performance on the assessment demonstrated particularly weak content 
knowledge. This teacher expressec* concerns about his/her inadequate content knowl- 
edge, acknowledging that the assessment was fair, since students often asked the same 
kinds of questions. 

The assessors felt that this teacher showed promise as a teacher, but needed 
more strength in math content knowledge. They also felt that this teacher was likely to 
seek out assistance and benefit firom it if it were offered. They were concerned about 
whether a state assessment for certification would allow or provide needed support. 
This is particularly important in view of the difficulty of attracting teachers to work in 
the inner city, the teaching profession's difficulty in attracting minority teachers, and the 
percentage of minorities failing current methods of assessment 

The assessors also raised questions about the fairness of the SSI-EM among 
teachers for whom mathematics was not a strength. They pointed out that elementary 
teachers must attain competence in a number of subjects requiring different skills. It 
may not be reasonable to expect teachers to attain the same degree of proficiency in all 
subjects. Although the state has expressed an interest in designing an assessment 
system which allows for particularly strong performances in one area to compensate for 
weaknesses in another, it is a policy decision whether "area" should be extended to 
apply to subjects as well as to specific teaching competencies. 

Since the SSI-EM covered one of the most difficult mathematical topics that are 
taught at the elementary level, assessors also questioned whether the level of perform- 
ance exhibited on these tasks was representative of a teacher's ability to teach less 
pedagogically difficult topics. This would be particularly important for those teachers 
who are less familiar with mathematics. 



Appiopriateiiess as a Method of Assessment 

While teachers believed the SSI-EM tasks were fair, they were almost evenly split 
on whether the subject matter and concepts were appropriate for demonstrating their 
teaching skills, with 16 of the 40 teachers marking '^es, 17 marking "no,", and three 
giving qualified answers. Of the 17 teachers who marked "no," six explained that they 
had not previously taught the topics, and felt that their performance was not representa- 
tive of their teaching skills. Seven participants objected to the narrow focus on one 



5.25 

125 



topic. Six teachers did not feel that the assessment conolticn:. were realistic for various 
reasons such as the pressurized context and the limited extent of preparation. 

As discussed earlier, we share the teachers' reservations about the limited focus. 
The decision whether or not to assess teachers on topics which they have not taught 
depends on ^ policy decision concerning the information to be gained from pilot testing. 
On the one hand, teachers are credentialed to teach across grade levels and topics. On 
the other hand, if competencies are being assessed which are assumed to depend upon 
experience, then it would be most appropriate to assess teachers in areas in which they 
had experience teaching. The teachers' reservations about the "realism" of the assess- 
ment conditions should be evaluated in a similar light If the intent is to see a teacher's 
pedagogical decisions in the best light, then more time should be provided for planning 
a lesson, and supplementary matenals should be available. On the other hard, it seems 
unhkely that any of the improvements suggested by the teachers would result in anything 
but marginal differences in the display of pedagogical content knowledge. If teachers do 
not understand basic mathematical concepts and their interrelation, their choices are not 
hkely to improve given either additional time or supplementary materials whose quality 
they are unable to evaluate. i j 

Reflecting their limited knowledge of tasks which they had not administered, 
assessors and scorers tended to be task specific in their perceptions of the ability of the 
assessment to measure teaching competency. In informal di«' nssions, assessors praised 
Lesson Planrang for its ability to elicit information about a icacher's pedagogical knowl- 
edge and pedagogical content knowledge, especially with regard to a teacher's design of 
instruction. The assessor who administered Topic Secpiencing felt that the task based on 
elementary fractions was reasonable, while the one based on ratio and proportions "did 
not work. Other assessors and scorers felt that many elementary teachers are not 
prepared to be tested on their knowledge of mathematical concepts and how they inter- 
relate. Assessors felt that there were alternative models of tests such as existing multi- 
ple-choice tests that adequately examine a teacher's content knowledge, and that inter- 
views were more appropriate for assessing pedagogy and pedagogical content knowledge. 

On the whole, assessors and scorers felt that the method of semi-structured inter- 
viewing had some merit, although the SSI-EM itself needed revisions to address pedago- 
gy and pedagogical content knowledge more fully A couple of the assessors and scor- 
ers agreed with many teachers in their belief that simulations and artificial conditions did 
not fully tap a person's ability to teach and that any interviews should be supplemented 
by classroom observations. One scorer believed that some of the poor-scoring teachers 
were probably good teachers in other subjects. The two assessors on sabbatical from 
classroom teaching felt that supplementary materials should be provided to make the 
assessment more reflective of teaching by first-year teachers. 

Although the SSI-EM demands a high level of content knowledge, it assesses 
quite well a teacher's ability to represent content, explain conxpts, and sequence in- 
struction. All of these are aspects of content pedagogy which depend on a sophisticat- 
ed level of content knowledge. 



ERIC 



5.26 



128 



Assessment Format 



Teachers, assessors, and scorers were asked their perceptions of various aspects 
of the Semi-Structured Interview and Assessment Centt^ formats. Their comments are 
summarized below. Thi,s section provides information about the clarity of the assess- 
ment, the choice of tasks, the use of probes, and the use of interviewers. 



Qarity of Assessment 

A detailed description of teacher responses is found in the Administration Report 
for Spring, 1989. Teachers generally felt that the materials and instructions for the SSI- 
EM were clear. Roughly two-thirds of the teachers found the written materials they 
received prior to the assessment to be helpful These materials did not include detailed 
descriptions of each task. Description^ of the tasks and scoring criteria prior to the 
assessment would assist teachers m anticipating and preparing for the assessment* It is 
unclear, however, whether and if so, how, this additional information would affect the 
level of anxiety experienced by the teachers, especially those who are not confident of 
themselves in mathematics. 

Nearlv all of the 40 teachers found the oral overview at the beginning of the day 
to be helpful, and the directions for the assessment clear. Suggested changes included: 
send the orientation materials in advance, give a clearer idea of the all-day process, 
include driving time in the directions, and revise the letter to make it sound less intimi- 
dating. None of the teachers suggested changing the directions for the tasks. 

Assessors felt that the instructions for the tasks were generally complete, detailed 
and clear. Few changes were suggested. One assessor suggested that some of the 
terms, e.g., "assessment instrument," "math concept," were confusing and should be ex- 
plained or written in the vernacular. Another assessor felt that changes in tone from, "I 
now want you to..." to "Please, now „" would improve the atmosphere and put the 
teachers more at ease. 

Some teachers experienced difficulties in performing the tasks. Some of the diffi- 
culties were at least partially due to lack of content knowledge, e.g., the expressed need 
for more "background experience in math" to properly prescribe remedial mstruction for 
the Instnictioiial Vignettes, and insufficient understanding of some of the topics in Topic 
Sequencing. Other sources of confusion were the lack of information about the evalua- 
tion criteria, the redundancy and difiBculty of the questions, and a belief that the Topic 
Sequencing tasks depend on the characteristics of the students being taught. We believe 
that teachers should be informed of the general scoring criteria prior to the assessment 
and be guided in some manner during the interview to provide responses in sv.iicient 
scope and detail to reduce ambiguity in coding their responses. The latter can be done 
by adding or revising questions to better focus responses, or by giving e..ch teacher a list 
of areas to be addressed in the response to broad questions (as was done in the SSI- 
SM), e.g., when providing an overview of the lesson. 

The questions sometimes seemed redundant to teachers when they were asked to 
explain how they would teach a student to work a problem, followed by a request either 



5.27 



127 




mTthi^'oT-vir • . "istm^isn Detween algorithms to solve problems and 

S .oWp nr?SiP"°^'P''' underlying al§on1hms. In these Les, the question? askinTh^ 
^vrn^sZtfon'^^^^ ^ked teachers to explaiTw'hJX 

ch^r^rtSv?^-^*"^^ sequence of topics varies according to student 

thaTrSos^teachlS^^ ^l^^ " ^^"S^^ """^^^ Gi^ln the way 

mat most teachers had been tramed m mathematics to look for the solution and not for 

answerUSfh^^^ T^'"" "^^^ ^'''^^''^ ^^ere was o^oTe cor^eS' 
anH nrH H S?^ theie was no one correct sequence, topics couHbe groSped in sets 

mher tnn r, fen^ ^ ordered? were prerequisites for Side^tandi^ig 

other topics. Therefore, some ordenngs would be incorrect for any group of leamere. 

criteria^we^^X'v^o'lS'^'^u^ ^^l^ ^^^^^^^^^ regarding the assessment before the scoring 
SSnn?[h.^l^P*'^ Although the scoring criteria do not directly correspond to the^ 
2,f?rS;I IJS ^- match between criteria and questions which ovgA to yield 
^uW S 'P^'-"^' °^a"y °f the questions, a sli^rewording 

rco?SAri?erin^' S!fr''"''''°i" ""^'^ "^".'y ^° y^^'** ^ ^«^P°"^ that'dire^ addresseTa 
scoring cntenori. For example, one scoring criterion for Topic Sequenciiiff is the validity 

SoS^r^dire^tfvS^^^^^ '"f^^ ^°"^P^ ^° stud^ts^S?^f Sic queS, 

TnX.H t !. ^'^^^ t^^y explain the topics to their students. 

Se.cinf.'^'"-'^"" -f ^^'^ *° ^^^^ e^^h topic meant to ttem. When a response 

^rnf^n^r^rf ^'""^i"*?^" ambiguous and^the question had not been asked T 

fSVsZiadvTsc' re ^° ^'^^ ^^^^"^^ 

a n^^d^Z^f^^nJ:ZT ^^'^'J^,^ frustration with rating borderline responses, suggesting 
Lti^n wi^K yT-"*u" d'stmguishmg suitable from unsuitable angers. The^ frus? 
IfnS frn J^"'^^ ^ number of responses which they rated as ambiguous, wWch 
ranged from seven m Lesson Planmng to 66 in Instruction^ Vignettes. Some of the 
fcSnal'r V' ^^'^ than others, coSing to proWeLj in 

H?H nnt V^'f ^'^^^ ^ expressed frustration with scoring^riteria which 

Jd not closely correspond to questions aslced. Since scores were for thi most part 
T^^^l^tT }^ °^ responses judged adequate, excluding a catego^ means 

coJe for^^^L ^'^'"^^ category Ire not reflected in thlover^ 



K«f«r« I?® SSI-EM task questions should be revised to better match scoring criteria 

Hon^^^^^^ administered again. These revisions might range from reword- 
ic3fi?SS^nf i™"" I ''^^"^ ^° thit could elicit a 

tff «?nn?rjiwf^T* her route to reducing ambiguous responses is to provide 
nrn^cfn^Sf''?o'?^ ^° ^^^^^^"^ ^^^^^ ^° ^hey better frime their responses. Yhe 
provKion of sconng cntena to the assessors should improve their ability to probe more 
th^fZn^' suggested probes can be provided for specific questions to standardize 

r..«r.J^ addition to providing better alignment between questions and scoring criteria, 
rewordmg the existmg questions for a more explicit focus could also help standardize 
responses to make them easier to evaluate. One assessor commented that the vignettes 



5.28 

128 



produced a wide variety of interpretations, with different teachers seizing on differing 
but, m ner opinion, equally important aspects of the vignette in their response. 



Fomiat Features 



In the pilot testing of the SSI-EM, several features of the semi-structured inter- 
view format were identified as either helpful or problematic. They include the timing of 
the exercises, the choice of tasks, the use of probes, and the use of interviewers. 

Tumn^ The assessment was originally scheduled to take a little over six hours 
(mcludmg onentation, an hour for lunch, and feedback from teachers). In this schedule, 
an hour was allocated for each task. In practice, however, teachers did not take the 
entire time allotted. Lesson Planniim generally took an hour, but most teachers com- 
pleted each of the three other tasks m a half hour or less. Although it was known from 
u ^H^^ Lessaai Planning would take more time than the other tasks, the extent of 
the difference was not known. 

When the extent of the disparity in length of time required to complete each task 
was discovered on the first day, assessors began to experiment with ways to accommo- 
date these differences. In the first week of administration, teachers tended to spend the 
w^t time between tasks either reading newspapers or in conversation with assessors in 
the hospitality room. By the middle of the second week of administration, assessors 
began altering the schedule by administering a task to a teacher as soon as that teacher 
had completed the previous task scheduled and had indicated a willingness to continue. 
This resulted in teachers completing the assessment at varying times, with the teacher 
who had Lesson Planning scheduled last taking the most time. 

By the final round of administration, FWL staff had learned fi-om experience how 
long each task took, so an additional assessor was added to administer Lesson Planning, 
and a schedule minimizing wait time was devised For an assessment center format to 
be efficiently implemented, it is essential that tasks be designed to take rougWy equiva- 
lent amounts of time. In practice, there will be individual differences in task perform- 
ance that will complicate strict adherence to any predetermined schedule. However, 
prior piloting of tasks should reveal any great disparities in the average amount of time 
required for completion. 

About three-fourths of tlie 40 teachers did not feel they needed more time for 
any tasks, though about one-fifth did, with one teacher mentioning snecifically Lesson 
Planning and another Instnictiona! Vignettes. Generally, most did not feel there should 
be less time for of the any tasks. 

Qioioe of tasks. Although the relative length of the tasks needs improving, the 
four tasks (Lesson Planning, Topic Sequencinft Instructional Vignettes, and Short Cuts) 
do a good job of reflecting key activities which teachers must do to effectively plan and 
manage instruction at a conceptual level. Topic Sequencing and Short Cuts give good 
illustration? of the command of the subject matter. All four, but especially Lesson Plan- 
ning and Instnictional Vignettes, address the abiliw_to translate concepts into appropri- 
ate metaphors or activities. Lesson Planning and Topic Sequencing address the abihty 
to sequence instruction so that concepts are taught in such a manner that they build on 



5.29 



.12.9 



previous student knowledge and lay the foundation for material that occurs at a later 
point in the cumculum. 

Questions about one of the tasks, Instructional Yigneltcs, arose during the admin- 
istration and scormg of the SSI-EM. Our concerns about vignettes are based not only 
on our e>q)enence wth the SSI-EM, but also with observation of the use of vignettes in 
the Stanford BIOTAP pilot test Few of the teachers had difficulty responding to the 
vignettes. However, the concerns of teachers who found the vignettes artificial suggested 
two potential problems m t le use of vignettes. The first is that teachers use a vanety of 
cues in formulatmg a respoi.se to students that are unavailable in vignettes. With the 
exception of the penod at the beginning of the school year, teachers know something 
aDout theu- students. In responding to a student, a teacher takes into consideration the 
students past level of performance, personality traits such as perseverance, knowledge of 
wnat the student has been taught previously in the year and previously established rou- 
tmes tor remediation. This information is unavailable to teachers when vignettes are 
presented, resulting in a variety of responses related to unarticulated assumptions based 
on the teachers own experience. Teachers varied in their comfort with making these 
assumptions; it is possible that teachers who are especiaUy skilled at tailoring responses 
studente ""8^^ greatest difficulty in responding to hypothetical 

Some of these difficulties can be overcome by adding additional information to 
the vignette and by piloting the vignette with teachers from varying contexts. For in- 
stance, one assessor found that teachers exhibited varying but equally valid interpreta- 
tions of a single vimette. She recommended that the questions be reworded to make the 
focal point clear. If a particular focus is desired, then the critical information can be 
included in the vignette. 

• -^^^^^ vignettes are artificial, they are unlikely to capture the ability to respond to 
individual students m a holistic manner which takes into account a student's mood at the 
time, personahty characteristics, preferred learning style, and content knowledge. They 
tfoseem to capture the ability to create metaphors, use alternative representations of 
tion?"^ spontaneously design activities to explain concepts and correct misconcep- 

The second problem with vignettes that became obvious during Lesson Plararaig 
IS that teachers vary in the way that they design instruction. Some situations that are 
designed to elicit teachers responses to common student errors are inappropriate when 
the teacher has carefully constiiicted the lesson to avoid producing such errors. One 
example is when a student is shown converting 2/50 to a percentage and arriving at an 
answer of 2%. Some teachers had spent quite some time initially in the lesson talking 
about th-! meanmg of a percentage, having everyone draw pie charts representing per- 
centages, and converting fractions with 100 as a denominator to percentages. One 
teacher went so far as to call the process of conversion of a fraction to a percentage 
renaramg. If, as in these instances, a teacher has carefully laid a foundation for 
student understandmg that makes it less likely that students make this type of error, 
then the only situations in which students produce that type of error are (1) if they have 
completely missed the point of the lesson, or (2) when they are displaying careless 
thinkmg. Teachers who make these assumptions will react differently from teachers 
who assume this is an instance of an incorrect algorithm applied by the student which 
produces a consistent pattern of errors. If the scoring system assumes a single source 



5.30 

■ISO 



of student error or fails to probe for assumptions about the source of error, the scoHna 
may tail to reflect the more complex teaching. " 

As a whole, the SSI-EM gives a good picture of what Ue Shulman caUs "peda- 
gogical content knowledge.' However, the methodology of semi-structure J interviews is 
stiU m need of unprovement, particularly in the vignettes (discussed earlier) and the 
construction of probes. cr v y 

Use of probes. The purpose of probes was to assist in the interpretation of 
ambiguous responses. Responses could be ambiguous because of their brevity or be- 
cause of the use of educational jargon. Without elaboration, it was often difficult to 
distinguish a terse summary of a complex thought from a vague explanation which 
lacked depth of analysis. Assessors were carefully trained and monitored in practice 
adnumstration on how to construct probes which were clarifying, and which did not cue 
tne teacner as to the appropnate response. In practice, however, assessors found the 
use ot probes problematic for several reasons. The first is that some of the questions 
were unclear or used terminology with which teachers appeared unfamiliar. An example 
ot a tjpe of question that was unclear was one asking the teachers to explain how they 
might mtegrate instruction on how to use a particular mathematical algorithm into their 
teachmg when they had in the previous question explained why they found it unsuitable. 
It was difficult to probe or even to respond to a teacher's questions if the assessor was 
not clear about the mtent of the question. Some teachers had difficulty responding to 
questions asking about "mathematical concepts" or "mathematical reasoning," even after 
they had just described accurately how to work the problem. 

The second reason probes were problematic is the difficulty mentioned above of 
avoidmg cues to the teacher about the correct answer. In the questions referred to 
above, which asked about "mathematical concepts," it was particularly difficult for an 
assessor to adequately explain the intent of the questions wthout giving cues about the 
appropnate response. Scorers reported that assessors sometimes gave in to the tempta- 
tion to lead the teacher to the correct answer, particularly when the teacher seemed to 
be slowly progressing toward the correct answer. 

The third problem with probes is one of standardization. Several assessors indi- 
cated that probing was the most difBcuh aspect of administration, not only because it 
was difficult to use probes that were not leading, but tailoring probes to the teachers 
caused some assessors to question the consistency with which they were using probes. 
tor example, because some teachers were clearly more nervous than others, the asses- 
sors reported probmg more gently or not at all when a teacher's tone and body language 
indicated that the probes were increasing the teacher's frustration rather than facilitating 
c:onstniction of an answer. This may have put the more nervous teachers at a disadvan- 
tage relative to other more confident teachers. 

Inconsistency in the use of probes could be ameliorated by revising the questions 
so thev more closely correspond to scoring criteria, and by improving assessor train- 
ing. Other problems resulting from the use of interviewers may not be solved as easily, 
however. ^ 



Use of interviewers. One can envision an alternate form of this assessment which 
asks the teachers to perform the same tasks and then respond to questions in a written 




5.31 



format Although teachers were not asked to comment specifically on the use of inter- 
viewers, many of the teachers did so either during informal discussions with the inter- 
viewers or on the evduation feedback forms. On the positive side, some teachers en- 
joyed the interaction with an interviewer, typified by the teacher who appreciated the 
opportumty to "think about and reflect on my newly gained teaching techniques." 
These teachers seemed to enjoy the interaction with the interviewers, and did not feel 
intimidated. 

teachers, however, in both our pilot test and other studies (c g., see Wilson, 
1988) were uncomfortable being assessed through an interview. For instance, some 
teachers mentioned the difficulty they had workmg one-on-one with someone in an arti- 
ficial situation (i.e, an interview at an assessment center). One teacher commented that 
good teachers don't always interview well and vice versa." Another teacher explained: 

/ don't work well under pressure, in a one-onone situation. 
The tasks would have been no problem if I worked them out 
at home or in the classroom. 

Tim teacher was not the only one who found the assessment extremely stressful, 
despite the best efforts of the assessors to put the teachers at ease and the relatively 
informal dtmosphere. The most extreme example of stress was one teacher who, after 
struggling with the three tasks generally perceived to he the most difficult (Topic So- 
quendng; Instnictional Vignettes, and Short Cuts), had a strong emotional reaction to 
the interview and required over an hour to regain composure. While this was the only 
such instance among the 41 teachers, many teachers commented that their anxiety 
would be heightened considerably if they were participating in the assessment for 
credentialing purposes. 

At least some of the stress experienced by teachers was probably due to anxieties 
which are intensified by having another person witnes,: your struggles. Several teachers 
refeired specifically to feelings of "inadequacy," "lowered self-confidence," and "incom- 
petence they experienced when they had difficulty answering some of the questions. 
Other teachers, sometimes attending the same assessment administration as the teachers 
reportmg high araciety, reported feeling nervous initially, but the assessors were able to 
put them at ease. 

Assessors varied in their tone and degree of formality. One scorer described the 
range as from "very formal, cold and rather tense" to "relaxed, cordial and even playfiil 
with the candidates." While the tone can be standardized somewhat through training, 
the anxiety which some candidates experience throughout the assessment regardless of 
assessor style is less easy to address. 

It is unclear why teachers in the SSI-EM expressed discomfort while none was 
reported or observed for the SSI-SM, which was similar in format One possible expla- ■ 
nation is the differing target groups for which uie SSI-EM and SSI-SM were designed 
The SSI-EM was originally designed for experienced teachers, while the SSI-SM was 
tailored for beginning teachers. Another possibility is that elementary teachers feel 
greater anxiety because they are not subject matter specialists, compared to the second- 
ary teachers, who had studied the discipline in which they were being assessed. Our 



5.32 

132 



impression is that the SSI-EM assessors were also more aggressive in probing ambigu- 
ous ansv/ers than those in the SSI-SM, providing numerous cues to teachers who were 
not domg welL 

Cost Analysis 

Cost projections for semi-structured interviews were discussed in Chapter 4 in 
connection with our experience pilot testing the SSI-SM. The costs for piloting the SSI- 
bM were used smce that assessment represented a later stage of development than the 
bbl-EM, resulting m fewer implementation problems. 

Technical Quality 

This section discusses three aspects of the technical quality of the SSI-EM: 
development, reliability and validity. 

Development 

The SSI-EM was based on four tasks that were developed, pilot tested, and field 
tested by the Stanford Teacher Assessment Project (TAP). Minor modifications were 
made m the mteiview protocols; major changes were made in the scoring criteria. The 
ongiiial set of tasks focused on the topic of elementary fractions, particiUarly the simpli- 
fication of fractions. Second versions of two of the tasks, Lesson Planning and Topic 
bequencmg, were developed to determine the feasibility of using existing tasks as shells 
to apply to new content 

The original tasks were developed by the Stanford TAP over a one-year period 
and were evaluated through a series of activities. First, the interview protocols were 
piloted with expert teachers, mathematicians and mathematics educators. Second, 
each protocol was critiqued by eroups of "expert" teachers. Third, the instruments 
were similarly reviewed by TAP^s Expert Panel in mathematics, composed of mathemat- 
ics educators, mathematicians, teacher educators, teachers, and TAP staff. 

Scoring instruments also went through a similar one-year, multi-staged develop- 
ment and revision process. First, TAP staff devised and tested various scoring formats, 
resultmg m an eclectic set of 10 different scoring mechanisms; some were holistic, while 
others were analytic After these scoring procedures were applied to a few sample 
protocols, their effectiveness was examined by a board composed of teachers, teacher 
educators, researchers, and TAP staff, After revisions, the scoring procedures were sent 
to a second board of examiners with a similar composition and once again revised. 
Teachers (who were not project staff) were trained to use this final set of scoring 
procedures to score the data collected at the TAP Assessment Center. The TAP staff 
collected feedback and analyzed these external scorings. 

The TAP instruments were revised for the SSI-EM in two ways. First, a TAP 
representative revised the interview protocols and scoring systems based on feed,back 

5.33 

133 



E°iL-? ° admmistered, scored, and analyzed the instruments. Second, the 
uuer^ew prot(yow were revised to make them more appropriate for begimiing teachers. 
S ^ instruments was to identify exempfaiy experienced teachers. 

Jh. t?f£ *S instruments for beginning teachers, the TAl' representative simplified 
the tasks and revised the scoring cntena to reflect a less sophisticated level of expertise. 
Due to time and budget constraints, these revisions were not subjected to wide reSew. 



Reliability 

«o ^ f^^^^u "^f^'-rater reUability and problems encountered in scoring suggest the 
Jff3 jovengthenmg and strengthening the training of assessors and scorers. Assessors 
need to be familiar with the sconng criteria to effectively conduct the interview. Scor- 
ers need more trainmg m rating performances, especially at the cutoff point 

Two versions of tasks that addressed two different topics were p;Joted. In gener- 
al, the questions m the interview protocols were similar and scoring categories were 
Ef irf;, IS ^x^^^^^ performed both versions of the same task, so data on the reliabili- 
ty of the SSI-EM across tasks is not available. 

Validity 

The interview questions and scoring categories for the SSI-EM focus on very 
specific aspects of a teacher's performance. Although the Lnterview questions have 
undergone some re^new dunng development, both they and the scoring criteria would 
fw^^oS subjected to a wider review in order to ascertain whether or not they reflect 
cf,w5,? professional consensus about important teaching competencies. This validity 
study should be completed before extensive developmental work is done on the SSI-EM 
% o^er semi-structured interview. Since the ^SI-EM scoring criteria are very spe- 
cific, singling out particular aspects of a teacher's response, they would be especially 
vutaerable to professional disagreements about important components of responses 
and/or the relative importance of these components. Some of the scorers emressed 
chS:k hsTforaat ^ comprehensiveness of some of the scoring categories that used a 

Ginclusions and Recommendations 

TTiis section contains conclusions and recommendations regarding the SSI-EM, 
organized mto the areas of administration, scoring, content, format, and a brief sum- 
mary. 

Administratum of Assessment 

.Like the other semi-structured interview that was pilot tested, the SSI-EM is very 
labor uitensive to administer; administration and scoring require one day per teacher. If 



5.34 



on-line scoring is developed and found to be feasible, the time required for administra- 
tion could be reduced slightly, but our experience with the CCI suggests that forming 
and documenting judgments is a time-consuming process. 

The following factors seem to be key to smooth implementation of the SSI-EM or 
any other semi-structured interview assessment: 

o availability of appropriate facilities (which are often difficult to locate); 

0 development of clear orientation materials for teachers, including de- 
scriptions of the tasks and scoring criteria; 

0 organization of tasks so all tasks require approximately equal amounts 
of time and only a minimal number of transitions are needed between 



o careful attention to the recording quality of the tape recorders used; 

o coordination and management of a large number of materials and 
pieces of equipment, including interview protocols, tape recorders, and 
labeled tapes; and 

o recruitment of assessors and scorers who are knowledgeable about 
mathematics, mathematics pedagogy, and student characteristics. 

Audiotaping proved adequate for documentation of the interviews. Checking the 
recording quality before each interview seemed to minimize the chances of unrecognized 
equipment failure; some recording problems were detected in the pilot test, and alterna- 
tive equipment was used. Precautions need to be taken to minimize the chances that 
assessors forget to turn on the tape recorder, such as oversized reminders printed in the 
mterview protocol. A policy of recording the entire interview, including the introductory 
portion and those times when candidates are thinking and not speaking, would minimize 
the chances of recording only a portion of the interview. 

Assessor training should include familiarization with both the scoring criteria and 
the recording equipment Some controlled experimentation needs to be done with 
probes to identify the kinds of situations in which probes are needed for scoring pur- 
poses, the effects on performance when probes are and are not used, and variance in 
assessors' use of probes. Such a study would inform the development of guidelines for 
standardized use of probes. 

As with the SSI-SM, the security needs of the semi-structured interview should be 
studied to determine its robustness with respect to the development of standardized 
responses. 



The scoring system of the SSI-EM is not suitable for adoption for a statewide 
assessment Scoring criteria that vary by task and by topic raise serious concerns about 
reliability, validity and fairness across differing versions of the assessment The scoring 



tasks; 



Scoring 



5.35 



135 



probiei^ ^^'^ ^^^"^^ ^ ^ ""^'^ promisLng protot)^^ which avoids these 

after th^dSSSJS^n'^?.?^ implementation of scoring criteria which were developeu 

a<™stratton of the assessment underscores the importance of developing 
tasks and sconng cntena smiultaneously and analyzing their aSgnmerit prior to pW 

rv K,r should include clear examples of responses in each rating catego- 

7:r2-^u trammg should be devoted to differentiating borderline responses, 

especiaUy at the cutoff pomt between passing and failing. 

Assessment Content 

n.r^i. P!!'" Observations and information collected from assessors, scorers and teachers 
isi^ElS- ^^^^ ^"^^^^ following conclusions about the content of the 

0 Afsessment content was in line with the philosophy of the Mathematics 
tramework. Congruence was good, though not complete, with respect 
to areas of emphasis and characteristics of deUvery of instruction. 

0 Coverage of the California Standards for Beginning Teachers could be 
improved by refining current tasks and developing new ones. Some 
standards covenng teacher-student interaction or student outcomes 
could only be indirectly addressed with this assessment 

0 For the most part, the tasks were perceived by teachers as being job- 
related. The major exception was Topic Sequencing 

0 The content was difficult for the beginning teachers who participated in 
the pilot test Less than one-third of the teachers passed more than 
half of the tasks. Some difficulties indicated a need for improvement in 
teacher preparation; others were more likely the result of inexperience, 
either m teaching in general or in teaching the particular topic. 

0 Teaching diverse students in a variety of contexts was not addressed; 
teachers of low-achieving students felt disadvantaged with respect to the 
questions and tasks. 

° ?ii°^of IxT^®^^^ teaching competencies were generally highly valued, 
tne ^i-hM. was judged as a fair way to assess various teacher groups; 
pilot test data were too limited, however, to draw conclusions about the 
periormance of minority teachers. 

Hrr.. ^^f is chosen for further development, it would benefit from the same 

type of content review by an expert panel that was recommended for the SSI-SM. 



5.36 

138 



Assessment Fonnat 



Based on experience with the SSI-EM as a state-of-the-art prototype, the 
stren^tis of the semi-structured interview format appear to be in assessing the ability to 
plan instruction and the command of the subject at a conceptual level of understanding 
it appears weakest in assessing a teacher's ability to implement instruction and manage 
the classroom. * 



Our experience m implementing the SSi-EM suggcs that the following issues be 
considered when contemplating adoption of the semi-structurcu interview fora'at for 
teacher assessment: 

0 On numerous occasions, a teacher's lack of content knowledge affected 
his/her ability to respond to questions and made it difficult to clarify 
nuestiOHi without reveaJing ihe nature of a correct response. Questions 
that were subtly different were seen as equivalent to teachers who did 
net .omprehend the subtle dir!erences. 

o Vignett'-.o need to be cai^Mly constructed with a specific focus so all 
relevaiu information can b« included and the range of possible interpre- 
tations IS narrowed. Teacher asiurnptions that may affev the evalua- 
tion of their responses need to be explored in the interview. 

o The use of interviewers to collect data seems to heighten anxiety for 
many teachers, especially those teachers who are not performing well. 

In general, the use of interviewing, as compared to written or dictated responses 
to pnnted questions, should be explored further. The extent to which differences in 
mterviewing style affect teacher performance should be a key consideration. 



Siumnaiy 

If semi-structured interviews are selected as a method of assessing new teachers 
for credentiahng purposes, a close review or study of the information yielded by the SSI- 
bM tasks and questions could inform the development of prototypes. However, the SSI- 
bM scoring system does not appear to be a promising approach that bears further 
development 



5.37 




137 



CHAPTER 6: 
ELEMENTARY EDUCATION EXAMINATION 



138 



CHAPTER (fc 
ELEMENTARY EDUCATION EXAMINATION 



The Elementary Education Examination is an innovative multiple-choice test 
developed by lOX Assessment Associates for use by the State of Connecticut in the 
licensure of elementary school teachers, for grades k-8. The examination is part of a 
three-tier assessment system currently under development by Connecticut. Lmder this 
proposed system, the first assessment, a basic skills test in reading, writing, and mathe- 
inatics, is administered during prospective teachers* undergraduate programs. Prospec- 
tive elementary teachers who pass the basic skills test and successfully complete an 
undergraduate degree and a teacher education program must then pass the Elementary 
Education Examination. Candidates who pass the examination are given an initial teach- 
ing certificate and can begin teaching. During theii first year of teaching, they are 
further evaluated through observation and/or performance assessments. Those who pass 
this third round of assessment are then awarded a provisional teaching certificate. 

The Elementary Education Examination focuses on tb;ee major competencies: 
mastery of content knowledge, mastery of pedagogical knowledge, and mastery of 
pedagogical content knowledge. The examination differs from more traditional multiple- 
choice tests in two respects. First, the majority of questions ~ regardless of the compe- 
tency area being assessed ~ are embedded in classroom situations (e.g., "You are plan- 
ning a lesson on chemical changes. Which of the following....?"). Second, some of the 
items (referred to as "materials-based items") ask the candidate to analyze reference 
materials that are commonly used by classroom teachers. These materials include Indi- 
vidual Education Plans (lEPs), student worksheets (some blank, some with student 
work), lesson plans, report cards, and test reports. A description of a sample 
"materials-based item is as follows: 

Given four worksheets of student learning activities, 
the examinee must identify the worksheet that most 
closely matches a specified objective. 

The administrative format of this assessment is the same used for other large- 
scale multiple-choice examinations: Candidates come together at a test site and are 
assessed individually hy their written responses to a series of multiple-choice items. At 
each administration of this assessment, six different forms of the exam were used in 
order to pilot test a greater number of items. Each form consisted of 77 multiple-choice 
items, with 17 items appearing on all six forms. In instances of differences in perform- 
ance across forms, the 17 linking items serve as indicators as to whether the differences 
are due to the difficulty of the items or to differences in the ability levels of the exami- 
nees. Although two hours vas the suggested time for the examination, time limits were 
not established during this pilot phase. 

Beginning with information on the administration of the assessment, this chapter 
continues with a discussion of the content and the format Following these discussions 
are analyses of the cost and technical quality of the assessment. The chapter concludes 
with an overall summary, together with recommendations for further steps in exploring 



6.1 




such as the 



Administratioii of Assessment 



• ^i-S section begins with an overview of the administration of the assessmer It 
miSi^T y 0" the following: logistics (e.g., development of orientacion 

c^nWnL ' '^^"tJfication of teacher samples, scheduling), security arrangements, assessors, 
sconng, and teacher and FWL and RUC staff perceptions of the administration. 



Overview 



The Elementary Education Examination was administered at nine sites to both 
project and nonprojert teachers. Table 6.1 contains information about the pilot testing 
ot the assessment Over 250 teachers from a total of nine projects, plus over 300 teach- 
eis trom ii nonproject districts, were invited to participate in the assessment Ten 
fi^!?* '^^'^"u ^ere scheduled between May 4, 1989, and June 13, 1989, all but one of 

XSiV?-^'^!!^^'^ uV^^ late afternoon from 4:00 to 6:00 p.m. One administration 
was scheduled m two shifts on a Saturday morning, and one was canceled by lOX As- 
sessment Associates due to an anticipated poor turnout (only six out of 16 teachers had 
said they would come) pnd a lack of staff to administer the examination. 

fi^n ocJr °^ 138 teachers Mfticipated in the Elementary Education Examina- 

tion assessment (or approximately 25% of the teachers who were invited to participate). 
S °"iow^S'?^ te^'^her feedback forms completed by 137 of the 

teachers, 121 (88%) females and 14 (10%) males were assessed (tA/o respondents did 

wii f Tn^A^'","^"^' .J" °f the teachers described themselves as 

U\ T. fc.^ft °' Caucasian^ 12% (17) as Hispanic (or Chicano or Latino), and 3% 
\Lt • i^-.^^^^"^. I'reakdown of the remaining 8% of the teachers is as follows: 
Amencan Indian (.>), Asian (2), Pacific Islander (1), Other (3), and Not Reported (2). 

Logistics 

l^^stical activities for this assessment included the development of orientation 
r;- ^ °^ ^^^'^^^^ samples, scheduling the test administrations, making 

site/tacilities arrangements, arranging for the assessment materials, developing evaluation 
tidpante securing the evaluation feedback, and reimbursing the teacher par- 

The orientation materials developed for this assessment were very important 
because they were the means by which teachers were invited to participate in this as- 
sessment These materials included a letter which described the pilot testing project and 
the Elementary Education Examination, and specified the date, time, and location of the 
assessment admmistration. The letter also informed teachers that they would receive 
J25.00 for their participation as well as mileage expenses if they had to travel more than 
ID miles to the test site. Attached to this letter was a self-addressed, stamped postcard 



6.2 
140 



TABLE 6.1 



Dale 



May 6 

May 20 

May 30 
May 31 
June 1 
June 6 
June 7 
June 8 
June 13 



ELEMENTARY EDUCATION EXAMINATION: 
PILOT TEST PARTICIPANTS 

(Total Number of Teachers = 138) 



Project 



Santa Barbara/ 
Ventura 
Riverside/ 
San Bernadino 
Santa Cruz 
Santa Clara 
Centraiia 
Long Beach 
El Cajon 
Poway 
Vista**** 



Number of Teachers Invited 



Project 



30 

119 

36 
9 

12** 
7 
25 
18 



256 



Nonproject 



7 

98+* 
65+* 
86*** 
25 

22 

303 



Number of Teachers 
Who Took Test 



9 

30 

34 
8 
8 
12 
26 
6 
5 

138 



*Some districts invited their Ist-year teachers but did not release their names to us. 
**1 Centraiia Project teacher and 1 1 Irvine Project teachers. 

***Includes 45 Lyiiwood District teachers who were also invited to the Centraiia assessment. 
♦♦♦♦Nonproject district. 



ERIC 



141 



that the teachers were asked to return informing us whether or not thev were able to 
participate in the pilot test 

AH teachers selected for the sample for this assessment were sent the orientation 
materials. This sample included (1) all elementary teachers from the selected California 
f T ^^^^V^^ Projects who were not scheduled to participate in other pilot tests or any 
ot the Project evaluation activities being conducteci by SWRL, and (2) beginning teach- 
ers from neighboring Nonproject districts. 

After consulting with Project Directors about optimal times, the administrations 
were pnmanly scheduled for weekday afternoons. Although original plans called for two 
or more Projects to participate at each test site, FWL staff found it difficult to find sites 
that were both geographically centralized and required minimal teacher travel time. 
Hence, the majority of administrations included only a single Project and its nei> *^boring 
distncts. All adiijinistrations were conducted in a school auditorium, cafeteria, or class- 



room. 



Upon completion of the assessment, each teacher was asked to complete an 
evaluation feedback form and to sign a list which served to verify participation in the 
assessment All teachers who signed the list were then mailed a check for $25.00 plus 
mileage costs if they had to travel more than 30 miles to participate. 

Security 

Assessment Associates assumed full responsibility for all security arrange- 
ments of the test materials and test administrations. The test booklets were numbered, 
and all booklets were logged in when they were returned by the teachers with the 
completed answer sheets. Different forms of the test were distributed, minimizing the 
opportunities for one person to copy another's responses. 

Each administration of the examination was supervised by an lOX representative; 
at no administration, however, was anyone designated as a proctor. For any future 
administrations in California, a proctor should be present to assist in the test administra- 
tion and ensure security. Detailed instructions on the security of test materials should 
be available to the proctor, as well as instructions as to placement of test materials 
dunng the administration, counts needed at various stages, and actions to be taken if any 
matenals are missing. ' 

For this pilot test, scrap paper was permitted during the administrations and 
passed out with the test books and answer sheets. Examinees were also allowed to 
request additional paper if they needed it This practice poses a potential security 
problem. How is the supervisor or proctor to know how many sheets of scrap paper 
are given to each examinee and to ensure that every piece of scrap paper is collected? 
Such a procedure makes it quite easy fc examinees to copy items and note important 
information on the test's contents, and remove the paper from the room. Even though 
examinees are told not to mark in test books, they sometimes make marks or leave 
smudges by resting their fingers on certain parts of the stimulus materials or next to 
Jf^'Jf^" answers; this may provide information for future examinees using that test book. 
If the Elementary Education Examination or a similarly innovative multiple-choice test 



6.4 



142 



is elected for use in California, we suggest having nonreusable test books and not allow- 
ing scrap paper. 



Assessors 

An lOX Assessment Associates representative based in Southern California was 
responsible for administering the Elementary Education Examination assessment. At 
each site, the test administrator gave a standardized oral overview of the assessment, 
directed the teachers on how to take the exam, distributed the test materials, and col- 
lected the materials. 



Scoring 

The Elementary Education Examination answer sheets were in a machine- 
scorable format and were scored by the Hacienda/La Puente School District in Southern 
California. Because we have no information about the quality control procedures 
employed in the scoring, we cannot comment on this aspect of the administration. 
Whatever procedures are employed, they must ensure a very high level of accuracy of 
scoring if tne results are to be used in making decisions about credentialing of new 
teachers, and they must include procedures for dealing with unclear erasures, multiple 
marks, light marks, incorrectly keyed items, printing errors in test books, and other 
problems that affect scoring results. 

As with any multiple-choice test, a teacher's score for this examination reflects 
the number of items for which the teacher marked the correct answer. The results for 
this test were reported in terms of the mean p-value, or in other words, as the percent 
of examinees marking an item correctly, averaged across all items in each of six subject 
areas: Human Development and Instructional Methods, Language Arts, Mathematics, 
Social Studies, Science, and Other (i.e., multicultural, arts, physical education, health, 
and special education). (See Table 6.2 for the results shown as mean p-values.) 



Teapher, FWI^ and RMC Staff Impressions of Administration 

Approximately 63% (86 of 137) of the teachers who filled out evaluation feed- 
back forms stated that the written orientation materials they received before the assess- 
ment were helpfiil. Teacher suggestions for improving the orientation materials v/ere as 
follows: be more specific about test objectives; state that specific knowledge will be 
asked for in content areas; include a sample page from the test booklet; and reduce the 
length and wordiness of the orientation letters. 

^ An even higher percentage of teachers, 71% (97 of 137) found the arrangements 
for this assessment (e.g., scheduling, room arrangements, and travel distance to assess- 
ment site) to be reasonable. From the teachers who found the arrangements to be 
unreasonable or who had suggestions for improvement, there were 21 comments about 
scheduling, specifically about the end-of-year date and/or the afternoon time. Basically, 
the teachers did not like the test being scheduled during the end of the school year 
because of the many school activities happening then (e.g. report cards, end-of-year 
parties or trips), nor did they like being tested after school because they said they were 



6.5 



143 



TABLE 6.2 

ELEMENTARY EDUCATION EXAMINATION 
SPRING 1989 PILOT TEST RESULTS 



1 — — — — ■ 

AREA 


AVERAGE NO. 
PERI^ORM 


AVERAGfe 
PERCl^ACB 
OF fiWS CORRECT 
(N~138) 


Human Development and 
Instructional Methods 


12 


71% 


Language Arts 


23 


73% 


Mathematics 


18 


68% 


Social Studies 


9 


68% 


Science 


8 


71% 


Other* 


7 


78% 



♦Multicultural Education, Arts, Physical Education, Health, and Special Education 



6.6 

ERIC 144 



too tired and it was "too difficult to concentrate/* Some of the teachers who participat- 
ed in the Saturday morning assessment also commented about the scheduling, saying the 
Saturday morning time was ^Mnconvenient" because it cut into their classroom planning 
time. 

Other complaints or suggestions for improvc^ment made by teachers were as 
follows: 

Improve facilities - 10 
Increase notification time - 9 
Reduce travel distance - 7 

Of the complaints about facilities, most pertained to the room being too noisy or too 
warm. Increased notification time was desired by those teachers who did not receive 
their orientation materials until a day or two before the assessment date. The teachers 
who wanted reduced travel distance were, for the most part, those teachers who had to 
travel up to 75 miles to participate in the San Bernardino/Riverside assessment. 

Assessing the teachers after a full day of teaching during a very busy and hectic 
time of year was not an optimal choice in timing, and could be the reason for such low 
participation levels. Other than low levels of participation, no serious administration 
problems were experienced. 



Assessment Content 



As mentioned earlier, the Elementary Education Examination covers three major 
competencies: (1) content knowledge, (2) pedagogical knowledge, and (3) pedagogical 
content knowledge. The content areas include nine subjects that are taught in the 
elementary curriculum: reading/lang[uage arts, mathematics, social studies, science, the 
arts, physical education, health, special education, and multicultural/bilingual education. 
Pedagogical knowledge includes topics such as human development, classroom manajge- 
ment, and student motivation. Pedagogical content knowledge refers to the application 
of pedagogical principles to specific subject areas, such as those listed above. 

The developers of the Elementary Education Examination focused their efforts on 
identifying the knowledge and skills that are necessary for competence or satisfactory 
performance, as an elementary teacher. The majority of items on each of the six test 
forms were designed to assess content knowledge or pedagogical content knowledge. As 
shown on Table 6.2, however, the average number of items representing each subject 
area differs. The subject area of reading/language arts is represented by the largest 
number of items (23-24 items on each form), while math is second (18 items on each 
form). Within a subject area, the proportion of items which assess content knowledge 
versus pedagorical content knowledge also differs. This difference reflects the fact that 
(1) groups of Connecticut teachers and subject-matter specialists established guidelines 
for item development separately for each subject area, and (2) in some of the subject 
matter fields (e.g., social studies and science) there is little consensus regarding the best 
ways to organize and present content to students. Thus, in reading and math, two areas 
where there is relatively high consensus as to the best ways of applying content to 
pedagogy, there are proportionately more pedagogical content items than in the areas of 



6.7 



145 



JRJC 



oSLS; Pinrn ^-'^^ '^^s consensus about content 

ELe&,» w&'?.W^'^' f "1' that assess content pedagogy are the "materials. 
uSd on -nwV^ i^^?^' *° ^"^'y^^s reference iSate rials that are commonly 
used on the job (p.g., worksheets, test reports). ^ 

followin^SimeSnsf Education Examination is discussed along the 

0 Con^Tuence with curriculum guide or framework emphasis; 
° 'Scher? °^ California Standards for Beginning 

0 Job-relatedness; 

0 Appropriateness for beginning teachers; 

o Appropriateness across different contexts (e.g., grade levels, subject 

0 Fairness across groups (e.g., ethnic groups, gender) of teachers; 

0 Comparison with other similar instruments; and 

0 Appropriateness of the instrument as a method of assessment. 

standardJ'mvlSai^t/'5- ^° .'^^"^^"sions which refer to curriculum congruence and 
!np.t?i L ^^^'-^^^ discussions of the remaining dimensions are based on the per- 
spective of the participating teachers as reflected in feedback forms and tesv resultf and 

ife^A^n^l^n.^^^^^^^^ th^ rr^^^ ^" ^^^ition^^'ecause The actual 'tesf 
„f^ro1 property of the state of Connecticut, we can only discuss the content in a 
general way, without referring to or describing specific items. 

Congruence with California Curricuhmi Guides and Frameworks 

tion wa^nnfidJ!? 5°"^°^,ss'°"ed by Connecticut, the Elementary Education Examina- 
FmmewoS SSv.i°»^ ^°"8^S?nt with California's Model Curriculum Guides and 
cred^SS; nm^^fc^feSf "^.""J f^'l^f 't ^'^"8 ^^^'"^ted in relation to California's 
c?n™J^iS^^^^^^^ staff looked at the assessment to see in which areas there is 
congruena; with the gu des and frameworks and in which areas there is not In oarticu- 

iln.l ^^!f ^"8hsh-Langua§e Arts Guide, the Mathematics Framework, the 

(oZe .^^I^L^^^r. ^u^'Vri^T ^'^"'^ Framework. In addition, the objecdves 
tor the exammation hsted by lOX Assessment Associates were checked for congruence. 

pilinesteVlite^Tl^ ''''''^ "^'^ developed and 

6.8 

■148 



As will be evident in the following discussions, the guides and frameworks vary 
markedly across subject areas in terms of curricular aspects discussed, such as philosophy 
of instruction, curriculum content at specified grade levels, and desired characteristics of 
instruction. 

The Eng^h'Language Arts Guide has 22 guidelines categorized into five major 
groupings. The first grouping emphasizes the reading and the study of significant liter- 
ary works. Although one of the exam's 11 reading/language arts objectives mentions 
the use...of children's literature selections," FWL staffs analysis of the 20-24 
reading/language arts items on each of the two test forms did not reveal any items that 
dealt specifically with a literature-based reading program. Since a major thrust of Cali- 
fornia's reading/language arts curriculum is a literature-based program, the exam would 
need revisions to address this area. The second grouping emphasizes classroom instruc- 
tion based on students' experiences. Our analysis again revealed a lack of items and 
objectives corresponding to this grouping. The third CTouping refers to an interrelated 
program of listening, speaking, reading, and writing. The majority of items on each 
analyzed form and the majonty of the exam's objectives fall into this category. In par- 
ticular, many of the items focused on reading comprehension and decoding strategies, as 
well as on the writing process. The fourth grouping emphasizes a program that is inte- 
grated across the curriculum. This emphasis corresponds with one of the exam's objec- 
tives, and there are one and three corresponding items on each of the forms analyzed. 
Finally, the fifth grouping focuses on assessment methods. Three of the exam's 11 
objectives correspond to this focus, as do approximately three items on each form ana- 
lyzed. 

The Mathematics Framework discusses curricular content and characteristics of 
instruction. Curricular content is organized into five major emphases: problem solving, 
calculator technology, computational skills, estimation and mental arithmetic, and 
computers in mathematics education. The nine mathematics objectives for the exam and 
the 18 math items on each form analyzed address three of these areas: problem solving, 
computation, and estimation. None of the objectives or items refer to calculator tech- 
nology or computers in mathematics education. The bulk of the items refer to computa- 
tional skills, with a few items addressing problem solving, and only one item on each 
form analyzed referring to estimation. Ten characteristics of instruction are described by 
the framework. In the two forms analyzed, six of the characteristics are addressed by 
test items: Teaching for Understanding, Reinforcement of Concepts and Skills, Problem 
Solving, Use of Concrete Materials, Corrective Instruction/Remediation, and Mathemati- 
cal Language. There are no items, however, that address the characteristics of instruc- 
tion described as Situational Lessons, Flexibility of Instruction, Cooperative Learning 
Groups, and Questioning and Responding. Should the exam be used in California 
teacher assessment, consideration should be given to insuring that these characteristics of 
instruction arc reflected in the test objectives and in actual test items on each form. 

The Science Guide describes science programs for grades K-3, 4-6, and 7-8. Each 
program is divided into three areas: biological science, earth science, and physical 
science. Although the eight science items on each of the two forms analyzed cover all 
three areas, the number of items per subject area is not the same. Form #5, for exam- 
ple, has only one earth science item, but four physical science items. There is also an 
imbalance in the number of items distributed among grade levels. Almost all of the 
science items pertain to content knowledge specified m the guide for grades 4-6 or 7-8. 
On the two forms analyzed, the greatest number of items pertaining to a K-3 program is 



6.9 



ERIC 



147 



• ^5^°^®.'' '^^^ w^*--^ niay pertain to a K-3 or 4-6 orogram.) The 

Td gradele^lT^'""^" '° '"'"'^ ^'^^""^ in representation of contenT ireas 

♦^P^ ^t^^-Social Studies Framework &st specifies curriculum goals and strands 
and then describes a sequential curriculum for gradw K41 a3s and s?rand^^^^^^^ 
organized into three broad categories: (1) knowledge and cultufaSrstanS^^ 

^Tc^c^SSeTfn^^^^ ^^^^^"^^ (2) demTcrat?^^^^^^ 

Siic LTo- V 5' co^sj'^tional herxtage, civic rights and responsibilities), and f3l 

S part1Sron?4s?'l^^^^^^ ^"^'^^1 thinfing slig 

ana participation staUsV Each category is addressed by at least one of the eieht social 

S ana^iSr^'^re ? ^'"^^ Sies i^ms^on each 

torm analyzed There is, however, again an imba ance in the number of items distrihnt 
ed among grade levels. On each of the two forms analyzed onrone of he socS 
^rt'J!^ f '° 8^^^" ^-3- ^he CaUforma conteTtrexam s^^^^^^ 

In summaUon, FWL staff would describe the congruence of the Elementarv 
rrameworK as tair. Items could be added to the exam to ensure that all mainr Pm 

^^^hL^T""^ the congruence of the exam with the Science GiSde and SSylSocL 
studies toTon^^J S ^^"^ ^ °f both science and S^^l 

eradf leiels a^^^^^^^ better ba ance m the number of items pertaining to different 
f "niSgtthe^^^^^^^^^^ -'^^ ^ ^ betterUnce ^of items per- 



Brtent of Coverage of Califomia Staisdaids for Beginning Teachers 

assessment was developed for preservice or beeinnins teachers it 
was not developed with the Califomia Standards for Beginnhe Teache 

^Z^:!^cl''L?^TZ '^r^ throu/32) de& 

competence and performance that teacner candidates are expected to attain as a condi- 

Ss °^Sfn'J at^^^^ ^^T^"^ ^^"^'^'^ ExamiSaS doL not 

to llle^1hr^TooLfi^°Tr' 2^ Elementary Education Examination is 
10 assess the pedagogical knowledge which the standards represent, FWL staff analvzed 

fafL'ow&f f^^^^^^^ '"'^'^"^ ^° 'f''' P^^?8°g^^^^ knowSSTetdes'^^^^^^^ 
♦u ' ™owledge of human development and instruct bnal methods) to see how well 
they are congruent with the Califonfia standards. For StandSd Twhich assSs^^^^ 

S andaS^ 22t£^^c>^^^^ knowledge. Listed below are brief descriptions of 

of thf f^„rn??h. $ ?2 O^^^s^ndards appear In italics), accompanied by descriptions 
standards ^^^'^'^ °" ^^^"^ ' objectives) that correspond to^the 



-co;,; ^^"^ ^PPort and Classroom Environment Each candidate 

proSo^^^^^^^^^ ''PP^"" ' olassroorn en^ZmenfZ 

dSf ^i fSrH • W'/"'' f^af fosters mufuai respect among the persons in a 
Class. Th's standard is not addressed directly, but some test items assess a teacher's 



6.10 



148 



t^S 1 i °^ methods that enhance students' seLf-concepts, increase students' motiva- 
;L,?A.j; P/vP^°? positive attitudes towards learning all factors which contribute to 
rapport with students and help establish the classroom environment. 

Standard 23: CumaOar and Instructional Planning SkOls. Each candidate 
pr^P^res at least one unit plan and several lesson plans that include goals, objectives, 
strateaes, activities, matenals and assessment plans that are well deSned and coordi- 
natea with each other. Some test items require teachers to demonstrate knowledge of 
appropnate instructional objectives and lesson plans; others require teachers to demon- 
strate loiowledge of appropriate sequences for presenting concepts in an instructional 
unit and analysis of complex concepts into their constituent parts. 

Standard 24: EHverse and Appropriate Teaching. Each candidate prepares and 
S y^^^cfionai sfrafe^es, acfiWftes and materials that are appropriate for students 

^ LtTrff^S?-; '"'^'f ^ f ^"i"^^^ Some test items focus on the selection 

thV ct^i^ activities, matenals, and explanations based on students' characteristics and 
the skills and concepts to be learned. 

^nr„J*^°^^' ^^^^ MotivatioD, Lambrement and Conduct Each candidate 
mnr^vares and sustains student interest, involvement and appropriate conduct equitably 
^";^Sf variety of class activities. Some test items assess a teacher's knowledg? of 
metnods of increasing students' motivation; others assess a teacher's knowledge of 
beEf ^"'^ classroom management strategies that promote prosocial 

r.r.c.J^^^^^'^^^'^^^^ ^^^^ 'candidate communicates effectively by 
P/fsentmg ideas and instructions clearly and meaningfully to students. None of the test 
Items assess a teacher's ability to communicate effectively with students. 

id^ntifif^^J^'' ^^^/^^P^soosis, Achievement and Evaluation. Each candidate 

attainments, achieves significant instructional objectives, and 
e/aiuafes the achievements of the students in a class. Some test items address the 
provision of appropriate diagnosis of student difficulties; others are concerned with the 
appropnate interpretation of data from diagnostic tests and cumulative folders. 

flh,7,Vv f^^^ Cognitive OuUxmass of Teaching Each candidate improves the 
coS5Lc Inl"?.^ ""^^ ^° ^^^"^^t "l^o""afion, think analytically, and reach sound 
Zt^Z.uT teacher's knowledge of effective questioning strategies 

cally improve a student's ability to evaluate inforaiation and/or think analyti- 

.t„H^J^^^' ^'i!^ Outcomes of Teaching Each candidate fosters positive 
student attitudes toward the subjects learned, the students themselves, and their capaci- 

^r^c^tTf'^'!!, T^'^^? ^^"'^ a teacher to demonstrate 

!^ of methods of enhancing students' self-concepts and promoting positive atti- 
n™!""^^ o?er Items focu.«t on strategies to encourage student? to assume 

increasing responsibility for themselves. 

rnmn.^iS?^i* ^I^^ Teacb Qoss^ultuTaBy. Each candidate demonstrates 
compatibility with, and ability to teach, students who are different from the candidate. 
I tie ditterences between students and the candidate should include ethnic, cultural. 



6.11 

14.9 



S^olZd^^^^^fArS^^^^ c^^erences. Some test items assess a teacher's 

covers mos?Ttle°CaL',5^^^^ Elementary Education Examination 

any depT Should The 1^^^^^ S«g»n"'"8 Teachers but does not do so in 



Job-rBlatedness 



<!„hiprf1^!2T7^° participated in this assessment were asked if thev felt that the 

nWr^rSelv 6^^^^ ^"'^ Tf^S"^ relevanVto S jo'b ofteach- 

hoW^qSed t^^^^^ affirmatively. Many of these teachers, 

relevant tSanothcK CeS n S J '^'P°''^'^& ^^at some of the questions were mo e 
some tea?hersti;^-ing'^^^^^^^^^^^ --P^e. were cited by 

chn.enT^^fiSL^^^^^''^ specifically comn nted that the subject areas and concepts 

Se ?ev? fnfS^^ rl ""^^^^^".^ '° °f t^^ching because they teach at 

a grade level for which many of these sub ect areas and concepts Ire not applicable. 



6.12 



150 



TABLE 6.3 

EKTmr OF COVERAGE BY THE ELEMENTARY EOaCATION EXAMINATION 
OF CALIFORMA STANDARDS FOR BEGINNING TEACHERS 





CoQtcal Focus of Test 
Items tad Ob^tivec 
Correspcmding to Standard 


Extent of 
Coverag^e 


22: Student Rapport and Classroom 
Environment 


-Student's self-concepts, 
motivation, positive attitudes 
toward learning 


Partial 


23: Curricular and Instructional 
Planning Skills 


"Ins true tionfll nhiecHv^Q jinH 

lesson plans 

-Sequencing/analysis of 
instructiooal concepts 


raruai 


24: Diverse and Appropriate Teaching 


-Leamirig activities, materials, 
and explanations based on 
students* characteristics and 
skills to be learned 


Partial 


25: Student Motivation, Involvement, 
and Conduct 


-Student's self-concepts, 
motivation, positive attitudes 
toward learning 

-Benavior/classroom management 
strategics 


Partial 


26: Presentation Skills , 


-Not covered 


None 


27: Student Diagnosis, Achievement, 
and Evaluation 


-Diagnosis of student difficulties 
-Interpreting U^st/cum folder data 


Partial 


28: Cognitive Outcomes of Teaching 


-Questioning str^itegies 


Partial 


29: Affective Outcomes of Teaching 


-Student's self-concepts, 
motivation, positive attitudes 
toward learning 

-Behavior/classroom management 
strategies 


Partial 


30: Capacity to Teach Crossculturally 


-ESL instruction 

-Learning styles to teach/ 
student interaction 


Partial 


31: Readiness for Diverse Responsibilities 


-Not covered 


None 


32: Professional Obligations 


-Not covered 


None 



6.13 

151 



Primaiy-grade teachers especially felt that manv of the tect ;tem« <»r- n^* -t^^-^-^ 
tneir grade level. One teacher wrote: ' .r. n.. ....vaxu 

/ am a primary-grade teacher. I have no desire to teach 

fl^ concepts that were on the test I am certain this 
has no reflection on me as a first-grade teacher. 

UDoer Sde e^eme^^^^^^^ 'fe' ''P^'^'' developed for primary- and 

SP^ elementary teachers. (If separate tests were devefooed however it seems 

prLteO ^^^^^ ^^^'^ "° longerbe app?o 

items wS"reQuirTfhe\IS!!'/^°^ examination is the inclusion of materials-based 
such a^Son niam ?Fpf ° analyze materials commonly used in the classroom, 
S fhev SrfS^ ? o ' c- ^'^^"^ ^^^'^^^^ well these items reflect the 
tasks they perform as teachers. Sixty-six percent of the teachers (91 of ^37) resnonded 
positively, with answers generally ranging ^rom "ok" to "great " Sie tealir re^pon^^^^^^ 

^sS"^^^'^^^^ 'ferns ,/ere quite refZectiVe of my teaching 

TTiy are irrelevant in terms oi ^alfy knowing the student 
and their intentions and the materials available. 

Other teachers commented that the items reflected the "ideal" and no^ the "rpal" 

i^ZT^.iZ^^^^^ r^^P'^.'. '-^^^"^^ °f the mSls^lmii to inte 

Sroom '"^PP'°P"^^' "^^g'"^"^ majority/Limited English Profic ent (LEP) 

Appropriateness for Beginmng Teachere 

and abniS'n"eS?d'^m^V^n^^^^^^ '"S"'"^ opportunity to acquire the knowledge 

of thf efcher, m ^ ^^"1?"^^ ^° assessment questions, 83% 

rOnfJfrhPr . '^^^ responded affirmatively and 22 (16%) teachers said "no." 
;£ L!! ^^jority of teachers also did not frid any oarts of 

?arts 0 thTassessmeXr^^ T 9^ ^^^^^^^ ^^o did ToT^ 

!?6%)i^!^rr^^^^^^^ °^ -th were identified by 22 

shows ^f^^f^^rl ;^^?l«"^«n;ary Education Examination test results (Table 6.2) 
ac?os content aS If° hXh^'» teachers marke orrectly is rou^ly equivalent 
SchJdPH /i^r nfl f ' ^ ^=^^^6°^ ("^^"^^ represents several content areas) is 

excluded, ne average percent of items answered correctly ranees from a low of 68% in 

mo ? t?aS^^^^ ' ''f - language S't'us iv^nlhoufh" 

s^ssn?nt Qu^ '"P°"^ ^" ^ reasonable manner to the Is- 

sessment questions, the teachers correctly answered, on average, only two-thirds to 



6.14 

i52 



three-fourths of the items in each area. Based on these results, there is a distinct possi- 
bihty that the teachers were insufficiently prepared, either through education or experi- 
ence, for this assessment 



AppropriateiKSS across Contexts 



Fifty percent of the teachers (69 out of 137) assessed believe this assessment is 
appropnate for teachers m different contexts (i.e, across grade levels, subject areas, and 
vanous student groups); 39% (53 teachers) think it is not, and 11% (15 teachers) did 
not respond at all or gave ambiguous answers. A closer look at appropriateness across 
grade levels and subject areas and for teachers of diverse student groups is taken below. 

Gimfe level and subject area Of the teachei-s who responded negatively to the 
(question of appropriateness across contexts, the majority found the assessment to be 
mappropnate for teachers across grade levels. There were again many suggestions for 
developing separate tests for primary-grade teachers and upper-elementary teachers. 
Although It could well be argued that all elementary school teachers should have mas- 
tery of content knowledge and pedagogical knowledge associated with grades K-8 (i.e., 
the grades correspondmg to the elementary credential), it is interesting that many of the 
teachers do not share this point of view. 

Our analysis of the test items found there is a disproportionate number of items 
representing the subject areas and concepts associated vwth the upper grades (i.e., 
grades 4-8), especially in the areas of math, science, and social studies. Of the eight 
science items on Form #5, for example, there is only one item related to the primary 
grades, but at least three items specifically geared to grades 7-8 and three for grades 4-6 
(It is unclear whether the content of one other item is appropriate for grades 7-8 or 
grades 4-61 If a single test is used for both primary- and upper-grade elementary 
[eachers, there should be a better balance of items representing the different grade 

Diverse student. Seven of the 53 teachers with negative responses deemed the 
assessment to be inappropriate for teachers of bilingual or LEP students, and three 
teachers found it mappropriate for teachers of high- or low-ability students. Observed 
one teacher: 

In many parts of California, the classroom population reflects 
a greater proportion of bilingual students as well as lower- 
achieving students than represented by your test. 

• ^" ourreading of the exam's objectives, we found two (out of 61) that pertain to 
multicultural/bilingual education. One objective is to assess a teacher's knowledge of the 
effects of acquiring English as ;.econd language on students' cognitive and social- 
emotional development The other assesses a teacher's knowledge of learning styles and 
teacher/student interactions. Although we believe that both of these objectives are good 
ones (especially in the California context), our analysis of the test items revealed that in 
some of the forms there are no items that address teaching bilingual/LEP students or 
students who are characterized as low- or high-ability, and at most there are two items 
corresponding to these objectives. The test could be improved by adding more items 



6.15 



153 



foweStd^enll^^^;^^^^^^^^^^ ii)j!|-ts and low- or high-ability students; 
context (i.e., a WKhrabiiiS^^s?ud^^^^ siuaents is often dependent upon the 

student n anotKSoTS Fn^tM^^ classroom mav be considered a low-ability 
ly define the ?oS stuT^^^^^^ t(,at are develooed should cLful- 

strengthen the^padty of tKf^^ °i °^ ^^"•'^ also 

competencies encompassed by Standard 24 for teaching d^erse's^udents^^^ teaching 

Fairness across Groups of Teachers 

new tea&fbXSrf &te"^^^^^^ ''VS'' this assessment to be fair to 
other groups of new tfachen F^^ef iJr^nt?2?E ^^""^T Wv^"'^ 

teachers) did not answer or pave an Lw^. ^ teachers) disaped, and 9% (13 

response^ cons"SedTa sSLPyes 'a^^^^^^^ feTn/V.'^^^^"^^ °^ P°^'^^^« 
follows: ^ ^ answer, a few of these answers were qualified as 

Yes, unless they are only planning to teach very 
restncted and/or speciahzed subjects and/or students. 

Yes, if it's translated for different language groups. 

the teaJherl ho'iSSrtlt?h?f '^"'^ ""^^'^^"^^ ^^1^^" their answer; seven of 

fully Englishlproficiem. " ^^^essment would not be fair to any teacher who was not 

Appropriateness as Method of Assessment 

assessinI?e\"chfrtmS^^^^^^ °^ ^" appropriate way of 

and pedlSal-con^nf S P^^^ogical knowledge, 

ly, 46% & teachers w;,S^n^^^^ ^^^''^ers (72 of 137) responded positive' 

responded aSiSw and the ren.aining 8% (lO teachers) either 

ly usually dfd soS a s^"^^^ «veV» f'^ at an The Jeachers who responded positive- 

} uiu j,o wiin a simple yes. A few teachers, however, elaborated further: 

X^iS ^^%^.**^«^enf enaWed me to think back to "school 
days. Often we forget to think about theory, etc. 

Y^^he assessment forces you to tbbtk thoroughly and effec 

S ^Y^^Pftent as someone who scores lower. It's how 
the knowledge is used daily in the classroom that counts! 



6.16 

154 



This sentiment was echoed by many teachers. Some of the teachers indicated 
that some people don't test well but perform well in the classroom, and others test well 
but are poor teachers. Some teachers stated that this assessment does a better job 
assessing a teacher's reading ability than teaching ability. Other teachers expressed the 
opinion that this assessment is not sufficient to assess a teacher's competency, but should 
be accompanied by, or replaced with, interviews and/or classroom observations. Even 
many of the teachers who said "yes," qualified their answer with the proviso that this 
type of assessment should not be the sole measure of a teacher's competency. 

Along the same line, teachers - particularly primary-grade teachers - often 
commented that it was not appropriate to assess a teacher on content knowledge that 
was not used in their grade level. Numerous teachers suggested that, in order to be fair, 
separate tests should be created for lower-grade (K-3) and upper-grade (4-6) elementary 
teachers. For example, one first-grade teacher commented: 

/ think some of the questions can't be completely 
appreciated by new teachers unless thc/ve taught in 
a particular ^ade level Maybe a test could be made 
that's more primary-oriented for someone like myself. 

Even some of the upper-grade elementary teacher? felt that some of the questions 
(especially some of the science questions) were more appropriate for middle-school or 
secondary-school teachers and suggested that a division of tests should be developed 
accordingly. (As we indicated earlier, our analysis of the test items revealed that a 
disproportionate number of the eight science itoms on the two forms analyzed were 
geared for grades 7-8, or the middle school level) 

For some of the content areas, such as social studies and science, our analysis 
indicates that the focus of the items is almost exclusively on content knowledge rather 
than on bow to teacb the content (i.e., content pedago^). The reasons for this can 
probably be found in our earlier discussion of the constraints experienced by the test 
developers when designing test items; for example, the lack of general consensus about 
the application of pedagogy to the content areas of social studies and science made it 
more difficult to create pedagogical content items in these areas. In the area of 
reading/language arts, however, there is more agreement in the field about appropriate 
teaching methods and sequences, and thus there are more reading/language arts peda- 
gogical content items. A large proportion of these items, however, require teachers to 
correctly match learning activities to a named teaching method. Thus, in the areas of 
social studies, science, and reading/language arts, the teachers' criticisms of the test, i.e., 
that it does not necessarily reflect one s ability to teach, were sound. In other areas, 
such as mathematics, many of the items consisted of activities such as identifying the 
correct instructional sequence of worksheets to teach a specific concept, identifying 
concepts whose mastery was necessary to teach a specific new concept^ or identifymg a 
pattern of student errors. !t would be more difficult to make the case that performance 
on these types of items is unrelated to teaching competence. 



6.17 

155 



Comparison with Other Multipks-Chofce Tests 

whn tnnfrrti^^v?^°^"^*^P^®;^^f ''^ ^^^^ ^ common method of assessment, the teachers 

examinSs^re "r^rS;iT''^'' l°l!"'lj£?,.^^"iination to be veiy similar to other 
fo woS tentv f^^ t!'^ r °^ ^ ' ^"'^ teachers specifically judged it 
rLoon^effhat w^S^'Sk' t^^^^ers gave no response, and the remaiSna teachers gave 
responses that were ambiguous or did not address the question. ^ 

Assessment Format 

" admimSr'fiSe mS!f;?f°i'* examination is one of the easiest assessments to 
tSS SSlitilff^rS^f J- a relatively short period of 

Sh fnr if.i, admmistration are generally available, and the number of staff re- 

SSSfeS/e oft-STc^Vn"'?^^^^^ '^^ multiple-choice format mea ures 

S o^Snde^tanJina^^^^^^^^ ^^'^ information, but cannot measure 

Th?fo?mafals?!7S ^! ^^^er assessment approaches, 

who Da^dnatedln thfc ''?"u"S- ^°""^t considered by the teachers 

meStM^XSof t?e orTn^ "^'^ '^^"es specific to this assess- 

ment, tne Clarity ot the oral overview and d rect bns presented at the start of thp 

~b'a?k '""'^ """^ "■■"•■"8 °f 'St. .each*%refereS*s regard- 
Qari^ of Oral Overview and Diiectioiis 

ivw ^PP"™>^'«')' °f 137 ) of the teachers found the oral overview eiven 

.1,^ j°"'''J'" assessment in CaUfomia be explored further we sueeest 

Lees tte ^P^^^d by adding information about CdTcapped S- 

T^'^^^'^^'^^X^^'^ZXi'^' " -"-"'^"^ Of an^oSl prt 



6.18 

15H 



Qarity of Items 

When asked if they had trouble understanding any of the questions, 64% (88 of 
137 teachers) responded So," and 35% (48 teachers) responded **yes." (One teacher 
did not respond) Teache*^ apparently had the most difficulty with the length and/or 
wordiness of many of the questions, and with the terminology used in some of the ques- 
tions. Several teachers also cited difficulty with the many questions that asked for the 
MOST or LEAST appropriate answer. Other sources of difficulty were the actual 
content of the questions Te.g., science or math) and the use of reference materials (i,e., 
the materials-based items). The latter were most often found confusing because of their 
length (four successive pages of worksheets to review in order to answer one question) 
or format (the necessity of flipping back and forth between pages to answer the ques- 
tions). Some teachers also found that for some questions more than one answer seemed 
correct 

Our analysis of the format of the questions revealed that many of the items are 
long and wordy, and some, especially the Language Arts items, refer to very specific 
terminology (e.g., the Cloze method of teaching reading). In the former case, we did 
not find that the information presented was necessarily extraneous, and in the latter 
case, we noted that if a teacher did not recognize the terminology, she/he would most 
likely be unable to answer the question. 

We also agree with the teachers who cited difficulty with the MOST/LEAST 
questions. As FWL staff took the test, we noticed that some of our answers were incor- 
rect because we had neglected to read that the question asked for the MOST or LEAST 
appropriate answer. Sometimes, if we had answered one or two questions that asked for 
the MOST appropriate answer, we tended to assume - incorrectly - that the next ques- 
tion was asking for the same response. The MOST/LEAST format increased the proba- 
bility of marking a response that did not accurately reflect a teacher's knowledge. If 
only a few items were of this format, the difficulty might not be important, but because 
at least 50% of the items on the two forms analyzed are of this format, the low results 
may not necessarily reflect a lack of knowledge but possibly an incorrect reading of 
some or many of the items. This problem could be remedied by limiting the format to 
either MOST or LEAST items^ but not both. 

We also understand why some teachers found the materials-based items to be a 
source of difficulty. In this case, the difficulty is not in choosing the correct response to 
the items, but rather the time it takes to read or look at up to four pages of reference 
materials in order lo answer one question. The reference materials comprise at least 
33% of the test pages, but only about 15% of the test For example, on Form #5, 27 of 
the 80 pages are reference materials corresponding to 12 of the 77 items. Because of 
the additional reading required, it is probably safe to say that the materials-based items 
require more time to answer than do the other items, if the test is revised, more items 
should be added for the longer sets of stimulus materials. 



Timing of Tests 

Although the pilot test administrations were untimed, teachers were told to 
expect the examination to take about two hours. Ninety-six percent, or all but six, of 
the 137 teachers felt that two hours was sufficient time to take the test. Teachers 



6.19 



157 



fver'SS;;!^''^ '''' ^"'^ ^ ^° hours, with a few teachers taking just 

time l^tbl^St'^fhTr^t^*'^^^''^^^^^^ "s^'J ^" California, we suggest that a 

U^imS teSs^in ilf essentially all examinees to reach the last item on the test, 
test Srial^SS™^^^^^ f-^^^T' °^!;^^"uS. ^^^^ ^"ters, locating test facilities, having 
of LaSes ihoT nnt li^""^ i same-day shipments are needed, and increasing anxiety 
ot examinees who do not know how to pace themselves or how much time to take. ^ 



Feedback 



what t^S'^lutf^^^^ ^^^J"! assessment, teachers were asked 

nS^d ne^Ln^^^ ^ ^" ^^^P?^- ^^"^^ °^ ^^e different answers and 

n jmoer and percent of teachers who gave them are as follows: 



overaU score/scores in different areas - 27 (20%) 
weaknesses/areas needing improvement - 27 (20%) 
strengths and weaknesses - 21 (15%) 
the nght answers - 14 (10%) 
no feedback - 10 (7%) 



when a^TiSTh^fnr^^? ^'tuS ^° t^.'^ ^^^"^ ^^^^back should be given, 

p^^^^ i &t^^ch??ra ^f^;^. 

enor'4%fTX^^^^^ °i somcon?;;Sm ?h\%S?^t6 t^^^^^^^ 

soL as DossibS^ a^^^^^^ the feedback should be given as 

(e.g" cTpSetprin" ?a°r?&).^'°"^' ^^"^^^^ - ^ °f -"en form 

suggest JS°';^^^'?f i^'^ "^'^ teacher assessment, we 

vanS foms oT^hP tc^ P^?'^?!'^ """"^^s as scaled scores that equate the 

the oTnas^ barelv fel? °" ^ow well they did, possibly in 

s^ma^asoL? of tei^^^ ??f ^^"^ - J^^" Percentile rank. With a test covering 
su many aspects ot teachmg, the provision of resu ts by types of items is Questionable 
and could lead to misinterpretation of the test results. questionable 

Cost Analysis 

the Eleme^^'r^ pl^Jftiin^ associated with the development and pilot testing of 
uL r3^^ Education Exam, we estimate that the cost to administer and score this 
Sts°sffaTtrC3^^^^^ multiple-choice rrdTesptsTaTset' 

thr$32 40 rfnae n^5? T,^"'^ ^ ^''^"l'" ^® °^ ^^ese exams curtently are in 
arLter fnrS^^H ^fl'^^'" .^^hough the development costs would be somewhat 
greater for the types of items that are included in the Elementary Education Exam the 
admmistration and scoring costs would be similar to these. ^ tiducation bxam, the 



ERIC 



6.20 

158 



Technical Quality 



cvo^- ^culty and correlational data were provided on the Elementary Education 
fcjamination. Summanes of these are provided in Appendij, C under Constnict Validity. 
The mean compansons that are provided illustrate the level of difficulty of the different 
areas m the Elementary Education Examination for different groups. These data 
demonstrate that questions were of moderate difficulty. Mean p-values (the percent of 
teachers getting items correct) were in the high 60's and low 70^s across subject areas. 

Correlational data did not support that the different subtests form clear and 
separate scales. For example, the math scores on the Elementary Education Examina- 
tion correlates no better mth math scores on the SAT than it r'oes with Language Arts 
or Social Studies scores on the Elementary Education Examination. These data should 
be interpreted with caution since the items were in the initial pilot test stages. Other 
analyses that might be of mterest are to separate the "traditional" and "innovative" 
fJ^t *° ^^termine whether there is differential performance and information that is 
~ia!,y ^^^^^ ^^^^ ^^^^ ^^^^ ^° P-^°™ ^hese analyses were 

The Elementary Education Examination items and specifications should also 
undergo a content review by California educators. They should examine each item for 
fr^Ii^' ^^t""'^^! sensitivity, job relevance, and relationship to the California curriculum 
frameworks and credentiahng requirements. They should look at the test specifications 
for completeness, clanty, importance for credentialing, and job relevance. All items 
should also be reviewed by professional editors and sensitivity reviewers, if they have not 
^/!?%V^ '^^s^ reviews should be done prior to the field testing 

of the Elementary Education Examination if California decides to explore further the 
possible use of this assessment instrument for credentialing new teachers in California. 

njo ^inH'^' ^ rf^e,^"g the possible use of a pedagogical knowledge test in Califor- 
nia, the State must determme what additional information would be provided bv this 
assessment approach above and beyond what is already provided by other tests that are 
!!fnl credentialing decisions (e.g., NTE General Knowledge Test), and how 

mucn the use of this particular assessment would result in improved credentialing deci- 
sions* 

Conclusions and Recommendations 

This section contains conclusions and recommendations regarding the Elementary 
bducation Examination, organized into the areas of administration, content, format, and 
a bnef summary. 

Administration of Assessment 

Like other large-scale multiple-choice examinations, the Elementary Education 
bxamination is adr-nistered simultaneously to a large number of people. Benefiting 
trom many years expenence in conducting such examinations, the administration of the 

6.21 

I5.q 



ticauS^^th^^S^""^^^^^^^ P^"^' ^°Sistical problems. The most crucial logis- 
&aiS?n L muc^ r^^^ assessment sites. Although the Elementary Educatfon 
SoS ots^^th^^^T^''7 '° administer than the other assessments piloted, the 

Srefo5e fhP hfawT 2" 1^"^^' °^ teachers participating at a single 
^aielavaeThm^^^ afforded by this assessment may^ 

to f selected she "''^^ ^'^^ "^^^ ^" ^^^^ ^° ^^'^^^^ ^^^"^J hours 

af'm,-n,-3l!fHnn"nyi'l! ™P''°v^"^ente in test administration are suggested to bring the 
aSpteTpracti^i: ^^"'"""^"^ ^^^^^^^^^ Examination more in line with generally 

° SonUor°tL°t"told!ig^°^^ proctors in addition to the test administrator to 
o elimination of the use of scrap paper, which poses a security risk; and 

° ?nn«f t?! test booklets to reduce the incidence of addi- 

tional information provided to the candidate. 

also su?g?sMh°ff Sfr.?"/."P? u ^° orientation materials for this assessment, we 
£ctSUr?lnc K, ""^^^"^^^^ '■^'?'^.^ ^° "^"'^ ^^^^'^V '"^icate the test content and 

As^ssment Gmtent 

test itemf on SJo "J'!?/!"" '^^'^"^ ^^^"sors, our analysis of the 

IS • ° ?^ ^® examination's six forms, and on performance results we offer 
the following conclusions about the content of the ElemeStar?Scation 4amLd^^^ 

o Congruence of the test with the various California curriculum guides 
eSl'S^r/^ '° Not all curricular emphases are reflect- 
wifc ^^^.^"Lftems. however, and the balance of itenTs across ^rade 
levels and subjects, especially in the area of science, couIq be improved. 

° SS^fof S^ainn^rr* °u ^°^"^g^by this test of the California Stand- 
need ^rS?f ^ Te&chers is good, but depth is lacking. The greatest 
need for improvement is in the areas of cross-cultural teaching and 

leSls^es."^ "^^""'^ '"P^^^ ^° "^^"^'^ 

Tlie job-relatedness of this assessment was affirmed by a majority of the 
participatmg teachers. Those teachers who disagreed tended to be 
primary-grade teachers who perceived many of the test items as not 
being relevant to their grade level. 

The "materials-based items," which constituted a special feature of this 
assessment, were also judged by a majority of the participants to be 

.6.22 

16 (} 



0 



reflective of the tasks they perform as teachers. Some teachers, howev- 
er, such as those of many Limited English Proficient students, felt the 
Items reflected the "ideal" and not thi "real" classroom. 

0 Although the majority of teachers did not think this test was too diffi- 
cult for beginnmg teachers, the beginning teachers participating in this 
assessment answered, on average, only two-thirds to three-fourths of the 
Items m each area correctly, 

o An analysis of the content of two of the exam's six forms revealed a 
disproportionate number of items representing the subject areas and 

"^^^ "PP^'" g^^^^^ ('•^■' grades 4-8), as well as a 
lack of Items addressing the teaching of diverse student groups (e.g., 
bilmgual/LEP students, students characterized as low- or high-abiiity). 

0 The content analysis also indicated that the subsets of items measuring 
teaching competence varied across subject areas in the degree of sophis- 
oKn ^^q^i^ed to answer questions, vamng, for example, from the 
ability to match an activity with a particular approach to reading in- 
struction to the ability to identify the best manipulative to teach a 
specific concept in math. ^ 

o Although the test was judged by teachers to be fair to new teachers of 
both genders, different ethnic groups, etc., 40% of the teachers aid not 
inmK this assessment is an appropriate way to assess teacher competen- 
cy. Many teachers objected to what they perceived as a focus on assess- 
ing content knowledge, and others on being tested on subject matter 
they have not taught. Several teachers suggested that the multiple- 
choice assessment be replaced by or supplemented with more direct 
measures ot teaching such as interviews and classroom observations. 

a mnfenVt^viwTf, ro^^"^- '"^^i"" Examination is considered for further development, 
a content review, by California educators should be conducted to examine each item for 

cSulu^Sf "h?^ ^^^^^^"'^^^ relationship to the Call Sa 

fooT?t r.?^ ? and frameworks and credentialing requirements. They should also 
took at the test specifications for completeness, clarity, importance for credentialing, and 
JfvietrTprior^^^SdTsr^^^^ by ^e^sional editors and sensi^;i^ty 

Assessment Format 

rnntr.cBf will be discussed in more detail in Chapter 8 and 

fP..S. ! classroom observation and semi-structured interview methods of 

poHna f '^''"^^"^ Its strengths appear to be ease and efficiency of administratton and 
scoring, as well as an abihty to cover a wide range of subject areas. 

A J ^°JJ^t^ assessment could be improved in two respects. First, proce- 

dures to handle handicapped examinees, late arrivals, irregularities, or other special 



6.23 



ontlSS^wlS iSSid- following problems with specific types 



0 The matenals-based items were very lengthy, consisting of up to four 
pages ot reference materials for one question. Since these items most 
directly reflect actual instructional decisions, they should not be aban- 
doned, but the number of questions that are related to the most exten- 
sive reference materials should be increased. 

0 Items that require teachers to identify the "most" or "least" appropriate 
instructional technique constituted half of the items on the test. These 
questions were conftising, with reports oi instances of marking the 
most appropnate answer when the "least" was required or vice versa, 
liiis tormat was especially common for items that assess pedagogical 
content knowledge. These items should be kept to a minimum, and 
consideration should be given to limiting the format to either "most" or 
least Items throughout the test 

rnrrent'^^^^i!"i„-thP^^''f 77°*^ ^ sufficient time in which to complete the test in its 
current tormat with about 77 items per form. 

Summaiy 

Flpmen^JS^pJnll?."^^^" '"."^^'Ple-choice examinations, the innovative elements of the 
choS «?.7«menf ^Pp^' ^° '"'^'"^^^^ ^^e job relevauce of a multiple- 

s' t^tnc^.Ti innovative elements include the use of questions that explicitly 

nJn?r, T:'^'^^^ ^^^^^sroom contexts, and the use of teaching mat-rials (such as student 
fS J^t^^^^r manuals) in the test booklet However, taken as a whole, the 
sStf Lnd^tMn'?^ ^ 't™' °f i^^P'-o^ed balance across grade levels and 

ten SwipHa^i^ subjects, across the types of knowledge that are assessed (i.e., con- 
SsesSt 'Sro^nhT^' P^dWaf content knowledge). The relevance o this 
fo be Sinffi nl° ""^^ Z?'^ 'P^"^'^ populations also needs 

format r«mnctt.5» "° ^u^"^^?? ^° the items separately by either 

S^t^pedag^^) °' "'^^^"^^^-''^s^d "^n^^) ^cus (content, g^eneral pedagogy, or 



6.24 



ERIC 182 



CHAPTER?: 

SEMI-STRUCTURED INTERVIEW: SECr>NDARY SOCIAL SCIENCE 



1C3 



CHAPTER 7: 

SENH-STRUCnjRED INTERVIEW: SECONDARY SCXIAL SCIENCE 



The Semi-Structured Interview for Secondary Social Science (SSI-SSS) was de- 
veloped by the Stanford Teacher Assessment Project (TAP). Like the SSI-EM, it was 
developed as a prototype of a type of examination to be used to certify distinguished 
master teachers. The Stanford TAP had previously administered the SSI-SSS to a 
sample consisting mostly of experienced teachers, but with a few student teachers and 
first-year teachers. 

The SSI-SSS resembles the SSI-SM and SSI-EM in that it combines the Struc- 
tured Interview and Assessment Center formats. It consists of three tasks: 

(1) Reviewing a Textbook: A candidate reviews a textbook and com- 
pletes a form which solicits a critique of specific aspects; 

(2) Planning a Lesson: A candidate spends thirty minutes planning a 
lesson on a given topic and then responds to questions about that 
lesson; and 

(3) Use of Documents: A candidate is given a group of documents to 
study, selects two as suitable for serving as the focal point of a series 
of lessons, and responds to questions about both their choice and the 
use of documents in social science classrooms. 

Like the SSI-EM, the SSI-SSS required revision of the protocols based on the 
previous experience in administering the assessment. FWL and CTC/SDE staff met with 
one of the original test developers from the TAP, who provided guidance for changing 
the protocols. Due to previous time commitments, he could not make the changes 
himself. Assessors were recruited, and traixJng, to be conducted by FWL staff, was 
scheduled for May 25. 

In preparing for the assessor training, we became concerned about the feasibility 
of pilot testing the SSI-SSS fo. several reasons: 

(1) The level of difficult)' of the tasks was perceived to be high. We be 
lieved that teacher preparation programs do not instruct students in 
textbook review and the use of documents, and new teachers would be 
unprepared to do these tasks. Furthermore, the test developer had 
stated that he was veiy disappointed in the performance of master 
teachers in the first pilot test,' which reinforced our reservations. 

(2) Revision of the protocols included changes in the documents used. 
The test developer provided the nev/ set of documents, while FWL 
staff made changes in the protocols. In reviewing the revised "Use of 
Documents" protocol, we were not certain that the questions were 
consistent with the revised set of documents. The test developer had 

7.1 



ERIC 



164 



no further time to revise the protocols before the scheduled training 
of assessors. We were uncomfortable about conducting the training of 
assessors when we ourselves did not understand the intent of some of 
the questions. 

We reported these concerns to CTC/SDE staff, and it was decided to postpone 
the pilot test of the SSI-SSS until the protocols could be examined by experienced 
secondary social science teachers. After the experience, with administering the SSI-EM, 
where the content was perceived to be too difficult for beginning teachers by both asses- 
sors and many teachers, it was decided to defer the pilot test of the SSI-SSS until 1989 
1990. Another semi-structured interview in secondary social studies which has been 
specifically developed for beginning teachers by the State of Connecticut is also being 
considered for pilot testing. 



7.2 



ERiC 



CHAPTERS: 
CONCLUSIONS 



ERIC 



CHAPTERS: 



CONCLUSIONS 



This final chapter begins by summarizing our conclusions about the assessment 
approaches that were pilot tested during Spring 1989. We then suggest a framework for 
comparing our findings about the strengths and weaknesses of different assessment 
approaches. We conclude by identifying issues to be explored in the next round of 
developing and pilot testing assessment instruments, and decisions that should be made 
prior to selecting any assessment approach. 



Assessment ^proaches 



Although the purpose of the pilot tests was to use the specific instruments to 
learn about the potential of assessment approaches, the preceding chapters mainly 
focused on individual instruments. This section compares each instrument that was pilot 
tested to other instruments representing the same assess.nent approach, and summarizes 
our conclusions about the critical features as we.H as the strengths and weaknesses of 
each approach. These conclusions are based on our in-depth examination of one or two 
state-of-the-art instruments representing each approach. In formulating our conclusions 
in this section, we tried to go beyond our expenence with each individual instrument to 
imagine the development of parallel indicators, tasks, or questions, either to extend the 
approach to new domains or to better address the domains of teacher competence that 
we examined. 

E^ch instrument reflected one of three assessment approaches: classroom obser- 
vation, semi-structured interview, or multiple-choice examination. 



Classroom Observation 

Definitioa A classroom observation approach to teacher assessment consists of 
ob.serving teachers as they instruct students in their classrooms to evaluate their per- 
formance. Two dimensions of classroom observation systems are: open vs. closed and 
low- vs. high-inference. In an open system, the observer attempts to describe "all" 
behaviors that occur v^thout regard to selection or interpretation. In a closed system, 
the observer focuses on specific behaviors or categories of behavior. In open systems, 
evaluators judge the quahty of performance without the benefit of a careful definition of 
what to look fcr ;n classrooms. Observation systems that are typically used for teacher 
assessment are closed systems. The low- vs. high-inference systems differ in the degree 
of specificity in the behaviors judged. In low-inference systems, criteria are defined in 
terms of specific behaviors, allowing little observer discretion. High-inference systems 
describe the behaviors more generally, requiring more use of the observer's judgment to 
identify and judge behaviors. 



8.1 



ERLC 



lf)7 



°^ instruments pfloted As a state-of-the-art example of a class- 
room observation mstrument, we pilot tested a high-inference classroom obsewftion 
S!?m.VI!tic^,f®^^^i"^ Competency Instrument (CCI). In addition, we examined 
b?gSng teacherT ^^^^^^^^ instruments used in Honda and Georgia to assess 

«rp ar^lPE\tL^l^'^^^^^^^^^ classroom observation system in which 10 indicators 
the ?Lcc?nnn^ • ^^^^^^^^ jo represent major aspects of instruction: management of 
««„n?nZ.^,nH r°"S'"!' instructional process and student assessment. Four 
assumptions underlie the design of the instrument and distinguish it from low-inference 
nn re'T.T '^S' T competency check lists that are used Is terchtr credential re 
dSlr?,^. olc °nf\T^^''' acknowledges that effective teachers practice in many 

t:ritrl I 'A?:'- ^"''u'" °" S^"^'^^ ^^^^hing abilities; (3) it is intended for beginning 
fo?mancL emphasizes the importance of professional judgment in rating per- ^ 

nf Hn.-n?'^??^ weaknesses. Classroom observations assess teachers in the process 
annrn"?h is l^^f o I'l '° •^S^k^^''^ ^^^^ relevance and face validity. This assessment 
fh? oh rei^!L.l ^ '"'"^ suggested a specific model when discussing 

the job relevance or appropriateness of the assessment in which they participated. 

fnrn, ^'^^Stodolsky (1989) has questioned the utility of classroom observations as a 

ifLm m«nf teL^''/r'h^^^ ^^'^ '"'^''^ '""SS^''' that the subject being taught 

fv^^mc I 5^' behaviors that are measured by common observation evaluation 
P^im^n? f ^^^^^«"t to ^hich higher-order thinking skills are a focus of the lesson, 

ertnrp? V. J^^^T ^"E'^^,^^^"^, and the likelihood of observing leacher-directed (e.g., 
Sv i^rnnr^ nl'^"'^-^'' '""^"'^ approaches) instructional formats. While^ ^ 
tT.Mc ; V ° observation systems that consist of behavioral check lists are vulnerable 

ndlmemJ 'l'^^" '"^"^ ? ^PP^^ ^0 the CCI, which relies on professional 
ino^ fnr H-ff '"^^ ^''u^™^ goal of the lesson as stated by the teacher and 
allows for differing approaches to the same goal. 

ent oni^JTrJ'-^^"?",''^ also concludes (1988, p. 12) that "teaching is context depend- 
tuLlr^ stability IS likely only within well-defined contexts, such as lesson types withh 
t^-S^^no re".? levels." By allowing the teacher to select the lesson to be observed 
best dknl/vl ST; ' '^''^f ' ^"'^i^^t and the lesson approach that 

bfe to nhZ u f "^Petence. With a limited number of observations, it^is impossi- 
onH ^ complete sample of teacher behaviors across all subjects, grade levels, 

nen,H« tn T'""^'^^' J^^^^^ ^"""^ ^ y^^'' sampling problem is Aot, of course 
peculiar to classroom obsemtions, but is true of other assessment approaches as well. 

aenerpl^nrS!^?nT»?S?'^^°"'-'^l"' ^? ^ ^^'^ '""ed to assess the actual application of 
S^?n.. E,n h/ teaching m the classroom. Significant types of teacher classroom 
FechSf nrn hf " f -'^^"^ '''^"^u^' ^"'^ ^he methodology is available to overcome 
S n clr • °^ •"consistent observations. However, not all knowledge domains 
Hp«rnn^ 1'" classroom behaviors. Compared with other approaches to assessment, 
contem ^l^^J^r^^^'/u'^'i^ ""^^ appropriate for assessing limited samples of 
S^f £^ ^"''i?'' ^'P^^^^' '^"'P^^' of^general pedagogy, content pedagogy, knowl- 
edge of learners and learning, and management of classroom climate. ^ ^ 



8.2 



ERIC 



Semi<Stnictured Interviews 



Demmioa Semi-structured interviews provide opportunities for candidates to 
respond orally to a standardized series of questions or tasks that are presented verbally 
by an examiner who uses a script known as an interview schedule. Semi-structured 
interviews include "probes" to be used at the administrator's discretion to enable candi- 
dates to elaborate on their responses. 

Characteristics of instruments piloted Both the SSI-SM and the SSI-EM, which 
serve as state-of-the-art exemplars of semi-stractured interviews, draw largely from Gaea 
Leinhardt's research in mathematics in which teachers perform tasks and then explain 
their responses. We found few examples of interview assessments, so the SSI-SM and 
SSI-EM serve as early prototypes of the interview approach to the evaluation of teach- 
ers. 

Strengths and weaknesses. The strength of the semi-structured interview is its 
ability to assess depth of knowledge, especially knowledge that impacts instructional 
planning and decision-making. The semi-structured interviews have a good potential to 
adequately sample knowledge domains with a limited range, (e.g., educational philoso- 
phies and goals). Therefore, we see semi-structured interviews as the best approach to 
assess knowledge of learners and learning, the effects of school context, and educational 
philosophies and goals. For other domains with a broad range, such as curriculum 
knowledge, general pedagogy, , -d content pedago^, semi-structured interviews assess 
the subject, topic and grade level addressed in the mterview quite well, but the coverage 
of these domams is only partial. The direct relationship of teacher responses during 
interviews to their actual classroom behaviors is limited (except for content knowledge). 

To our knowledge, the question of whether teachers are able to implement the 
activities described during interviews has not been systematically studied. One research- 
er (Wilson, 1988) observed classrooms and discussed teacher patterns of behavior with 
supervisors of a small group of teachers participating m >emi-structured interviews. She 
found that the activities described during the interviews were consistent with those 
observed or described by supervisors; ho^\•ever, this study was in conjunction with volun- 
tary participation in a research project In the case of a credentialing requirement, 
teachers would have a much greater incentive to represent as strong a performance as 
possible in the interview. 

It is critical for semi-structured interviews to contain questions that explicitly 
address all the areas to be scored. The degree of explicitness needed was underestimat- 
ed by the developers of the two semi-structured interview prototypes which were pilot 
tested. Without either explicit questions or the ability to probe, it is impossible to know 
whether a teacher's lack of responses in a particular area are due to a lack of clarity or 
a lack of ability. Moreover, the use of discretionary probes in a "high stakes assess- 
ment" could SQTve to prompt correct responses among some of the teachers being as- 
sessed. The dynamics of probing ambiguous responses should be studied to see if 
probes are either necessary or desirable as part of the semi-structured interview ap- 
proach to teacher assessment. 

Like obseivations, well-designed interviews can meet standards of technical quali- 
ty. Unanswered yet is the question of the amount of distinctive information that inter- 
views add to assessment decisions. Studies of conver^'ent and discriminant validity are 



8.3 



ERLC 



mTuluo^n^^^t^^^^^ combination of measurement approaches considered in a teacher 



Multiple-Choice Examinations 



lects a ^SSf'r^sD^se '&rm^.°r' ^^"^'"^tions are those in which each candidate se- 
tions of ^ifgef's then^^ h"'*'^'^ theoretical and applied questions (^g., mplica- 

conte/fmoEThI? It"^' presented in an undefined classroom 

ability to*Ta°rS^?hr^d^?^f '^''T^ °^ multiple-choice examinations is their 
repre^sem^^^^^^^ faed-r'esSe formTr^f' W°"''^^^^ •^"g ^^^^ 

Seen nerfo mn^^^^^^ knowledge is limited and the relationship 

Dcrween pertormance on the exammation and actual classroom practice is modest. 

Framework for Comparing Differing Assessment Approaches 

thr.» J^^ 0^ teacher performance that were pilot tested in 1989 renresent 

nd nKewf "e\"c\"o?^^^^^^^ mStl^fe-choS^xaSL, 
domain. nfTLw i ^ approaches allows direct, authentic measurement of some 

oth^r domains of De?L^^ '""H'^^' P^^^^'^'"8 information about 

assessS LnnrnprhV?^ ^° ''^"'P^'^ Strengths end weaknesses of different 
fanae of doSc f ? ' 'i^ important to have a framework which includes a broad 
wnrf of teaching competencies. In this section, we describe such a frame- 

hat bee^plloTS ' ''''' ' °f as^^Smenrap^roThes 



8.4 



1^ 



170 



To identify a broad range o7 domains of teacher performance, we began with a 
review of literature on teacher assessment that discussed teacher competencies, or the 
knowledge base of teaching. We found surprisingly few publications that explicitly delin- 
eated competencies (See Shulman, 1987; Wilson, 1988; Leinhardx, 1989). We also drew 
from our observations of the assessments that we pilot tested and of two other assess- 
ments pilot tested by the Stanford Teacher Assessment Project in the summer of 1989. 
After this review, we decided that a list compiled by Shulman (1987) seemed to be a 
good summary. To Shulman's list, we added two additional domains, Classroom Climate 
and Professional Collegiality. We do not advocate that beginning teacher knowledge 
and/or practice be assessed in all these domains, but we believe that it is important to 
identify as complete a range as possible to fully compare different assessment approach- 
es. 

The domains identified are: 

(1) Content Knowledge: Understanding principles and concepts in the 
subject(s) taught, their interrelationships, and their application to 
other related content areas. 

(2) Knowtedge of Curriculum: Understanding hierarchical and nonhierar- 
chical relationships between the concepts and principles of the content 
area which guide construction of curriculum, resources available to 
teachers, and of theories guiding the development of curriculum. 

(3) General Pedagogy: Understanding and using generalizable approach- 
es and methods for managing, planning, presenting and assessing in 
struction. 

(4) Content Pedagogy: Understanding alternative representations of the 
subject matter, knowledge of student conceptions related to particular 
topics, and pedagogical reasoning related to the particular content 
being taught. 

(5) Knowledge of Learners and Learning: Understanding hov/ to tailor 
instruction to student culture, language, interests, background experi- 
ences, and cognitive and physical abilities. 

(6) Management of Classroom C3imate: Understanding approaches for 
creating an optimal physical and psychological environment for learn- 
ing, maintainmg rapport with students, setting high expectations for 
achievement, and establishing norms for student-student interactions. 

(7) Knowtedge of Effects of School Context Knowledge of the organiza- 
tional, political, cultural and social context of the school. 

(8) Knowtedge of Educational Phflosophies, Goals, and Objectives: 
Comparative knowledge of educational philosophies, goals, and objec- 
tives, including their bases and justifications. 



171 



(9) Professionai Collegiality: Knowledge and disposition for continuing 
professional development and collaboration with colleagues. 

AdditionaUy, to evaluate how well an approach can measure each of the nine 
domams, we recommend that California consider the extent to which each assessment 
could assess three dimensions of each knowledge domain: 

0 Sampling: the number and range of aspects (e.g., concepts, contexts, 
situations, skills) that the assessment approach can tap. The key issue 
IS how broadly the modality can sample the domain of knowledge/skills. 

o Depth: the extent to which the focal skills, tasks, questions, or 

responses provide evidence of the teacher's knowledge, understan ig, 
or reflective reasoning about the domain. 

o AppUcadon: the degree to which the focal skills, tasks, questions, 
or responses match the teachers' thoughts and actions as they occur in 
actual teaching situations. 

Using this framework, hybrid forms of assessment approaches can be designed to 
teke advantage of strengths and to compensate foi the weaknesses of a single approach. 

'"''^"'^^•* ^" observation and a semi-structured interview; an observation and 
^l^^ltt^r^fu' ^ semi-structured interview with a portfolio. We anticipate using this 
v?p?T/h.^nH^^f u °^ assessment approaches, including those pilot testid this 

year, at the end of the next phase of pilot testing. 

Table 8.1 illustrates how evaluations or ratings of assessment approaches might 
thr.n^lSf.'f ? "'"^ ^^^^^^"e ^°"^a'"^ three evaluation Simensions, using 

Iv^^P ;?finl'fh"''5^A ^ approaches. The table would be interpreted in the follo5- 
Lrir.L-yi strengths of Approach 1 lie in i';s ability to assess a teacher's ability to 
apply their knowledge in an actual teaching situation, especially in certain domains; the 
Sn r '^^l ^° ^° d^Pth. The potential for sampli^^ en ire 

rZrl J A k'^o""^' ^'^'^^^^ a P^'^^ sampling of a few domains. By 

. ^,^;h PP'°if KM?" ^ knowledge in depth in most areas, but exhib- 

Lh;?H?-c . ^^a^^^al'^hty to assess a teacher's ability to applv knowledge. Its sampling 
str^nt'^f nrif/h'^"" of Approach 1 for most dom& The third approach^is the 
strongest of all three m its sampling ability, which is its strength; it is weak in its ability 
to assess a candidate's knowledge either in depth or in relation to classroom applicSn. 

Cost Estimates 

Assessments that strive to validly evaluate teaching competence and performance 
are more expensive than traditional multiple-choice examinations. Given the develop- 
f^U ennr/"if ■ f ^"^^^"^.f"^ instruments that were pilot tested, the cost estimates in 
«^is report were included to illustrate the ingredients which compose the various assess- 
ment approaches, and not to provide accurate cost estimates to be used in policy deci- 
nf°^n estimates are highly sensitive to decisions regarding the administration 

or an assessment. For example, Connecticut reduces the costs of administering the CCI 
by training large numbers of observers who are then provided up to six days of release 
time by their employers, with the Connecticut State Department of Education responsi- 



8.6 



J 72 



TABLE 8.1 



AND THEIR ABILITY TC ASSESS SPECIHC TEACHING COMPETENCIES 





ASSESSMENT APPROACHES 


PERFORMANCE DOMAIN 


APPROACH 1 


APPROACH 2 


APPROACH a 


Content Knowlc<lge 














• 


© 


0 


Knowledge of Curriculiun 


© 


o 


0 




• 


© 


© 


o 


© 


General Pedagogy 




o 


• 




• 


O 


• 


© 


o 


Content Pedagogy 




o 


• 




• 


© 


• 


© 


o 


Knowledge of Learners and 
Learning 


Q 


o 




• 


• 


© 


• 


© 


o 


Management of Classroom 
Climate 




o 


• 


© 


© 


o 


© 


© 


o 


Knowledge of Effects of 
School Context 


© 


0 


© 




• 


o 


• 


© 


o 


Knowledge of Educational 
Philosophies and Goals 


o 


o 


o 


• 


• 


o 


• 


o 


o 


Professional Collegiality 


o 


o 


o 


0 


0 


o 


© 


© 


o 



Extent of Coverage 
• Extensive 
© Partial 
© Very limited 
O NotataU 



ERIC 



8.7 

173 



riad^AntSi^ht'^n '"^^^^^"^^ ^«^^hers. When California is 

mini iVimSS^',S^lTS'S ?P5roaches) to assess- 

the technicaflS[ty7f"thrai5roach: ° v^o^c-cuccuvc memoa or aammistration that preserves 

Next Steps in Designing a S^m of Teacher Assessments 

nent... '^t tT.l^IJf °f" °^ ^ °^ ^^^ssment for teachers has two major compo. 

S'thilt In t °/ assessrnent instruments and the design of an assessment system 

n a New ¥eSS;t?J^^ ""^'^ ^"^'^ P"°^ ''''' the dlifo^- 

d scus Jd thPn cn^P ^"^5 ^ "'^"^ to inform choices of assessment instruments are 
a" V sm^^^^^^^^^ "^^^^ — the design of an 



Futuic Pilot Tests 



nlan. f^th^-ilS'J^^'! of the three-year California New Teacher Project include 
S her JSL tLcp H "P""'"^ • ""^0°^^°^ ^"^^8 °^ assessments which exemplify approaches 
?he snri^^oV^fo^^^^ 'I 'T"^ Assessment approaches to be pit tLted in 

sis o?2omDe. «Sf n^^^^ responses, structured simulations, analy- 

sernnd^™ Swch P°?^°J'0 ^eview. The subject areas to be assessed will include 
R^n^rZk^^^^ elementary teaching. In addition, the Intera- 

Sd S thP nn^t c? J° '^Smmission the development of additional assessments as 
Sor^^ Lt^ y^"" °^ pilot testing provide opportunities for 

exploring issues related to evaluating assessments. Given budPetarv and time con- 

fuShSnw'.ronlv' '"'■"^"y ^'"^ all can be e^o^ed Th^Lues dis- 

we hPh-J^i to ^ representative rather than an exhaustive list, but include what 

we believe to be major questions unanswered by this initial round of pilot testing in 




^" the development of instruments, it is 
important that issues of scormg be considered at an early stage. No 
Sfi^^H ^^^""^^ ^.P?^^ tested until it has been subjected to a small- 
scale administration (with as few as two or three teachers) to see that 
the stimulus materials, questions, or tasks are eh'citing scorable re- 
Sn!?f"; . '^ ^^^.S^ accomplished, the larger pilot study can 
concentrate on ascertaining how the various subparts form either a 
smgle construct or separate factors, how they correlate between raters 

admSatiol^ ^^^"'^ °^ ' 

Ctrannisirioiiing assessments in different subjects. The piloting of differ- 
ent assessment approaches and assessments that measure the ability to 
teach different subjects provides an opportunity not only to evaluate the 
ability ot assessment approaches to measure teaching skills in a variety 
of subjects but also to deepen our knowledge about teaching skills. 
Estimates of the ability of specific assessment ir.stniments and ap- 
proaches to evaluate teaching skills in a variety of subjects can be facili- 

8.8 

174 



tated hv explicitly identifying similarities and differences between teach- 
ing sldlb in various subjects. For example, noncognitive skills including 
esthetic, aural, and physical abilities play a greater role in teaching the 
arts, foreign language, and physical education than in mathematics, 
social studies, reading, Enghsh, or science. It is likely that teachers of 
the former set of subjects need some skills that are not required by 
teachers of the latter set of subjects. The development of an assess- 
ment of middle school teaching could also assist m the identification of 
differences in teaching skills required at various grade levels. Addition- 
al skills identified in the research literature can guide assessment de- 
velopers as they construct the assessments; it is possible that other skills 
will be identified during the course of the development and analysis of 
specific assessment instruments. 

Greater range of teaching experience in sample. Evaluating the appro- 
priateness of particular assessment instruments and assessment ap- 
proaches for be^ning teachers can be informed by including student 
teachers, beginmng teachers, and more experienced teachers in the 
.ample of teachers participating in the pilot tests. Systematic group 
differences in performance can facilitate the identification of stages in 
the development of specific competencies, and can guide the choice of 
both assessment approach and the time when it is administered. 

Multiple asscKinent of teachers. Although we relied on our experience 
from the administration and analysis of assessment instruments as a 
basis for evaluating assessment approaches, assessing the same teachers 
with different instruments would provide more explicit data, especially 
for the comparison of assessment approaches. For instance, teachers 
could be observed with the CCI and participate in an additional assess- 
ment addressing the subject taught in the lesson observed. This design 
would increase our ability to assess the extent to which different as- 
sessments provide supplementary and complementary information. We 
could identify tradeoffs when one assessment instrument is chosen over 
another, and redundancies if more than one is used. In the absence of 
a general measure of teaching ability, these comparisons would also 
contribute information about the validity of the measurement of teach- 
ing skills that are assessed by more than one instrument. 

Ability to teach diverse students. In 2989 ve found that no instrument 
provided a good model for assessing competencies related to the teach- 
mg of diverse smdents. Assessment instruments that have been com- 
missioned by the California New Teacher Project for Spring 1990 pilot 
testing include an emphasis on the teaching of diverse students, an issue 
of increasing importance to California educators. The evaluation of the 
success of assessment instruments in addressing this issue can guide the 
identification of the next steps to take and pitmlls to avoid in building 
this component into all assessments. 

Content review. Prior to adoption, all assessment instruments should 
undergo review of teaching skills and subject matter content by Califor- 
nia teachers, teacher educators, subject matter specialists, and experts in 



8.9 



175 



performance assessment Although aU instruments commissioned by the 
CaMoraia New Teacher Project have included some review by grouos of 
v^uiiiifi educators, a wiaer review is needed to assess congruence"with 
current research and professional norms. 

Issues in Design of an Assessnacnt System 

«nnrno^lf ?^ previous discussion in this chapter has indicated, the utility of assessment 
approaches and mstruments camiot be evaluated apart from their purpose. The design 
whfph S'^'-h"^ system mcludes prerequisite decisions about the purpose of assessment, 
nS^^^ot^'"^^ selection of assessment instruments. We end this chapter by iden 
a^^tf m'ola^sLfsme^^t^'o? Srf^^"^'"'"^ of instruments for^se in 

o Assessment focus. Perhaps the most crucial issue is to decide what 
competencies are to be used as screens at particular stages of teacher 
preparation and teaching. Currently, prospective California teachers are 
required to demonstrate basic reading, writing and mathematical skills 
and subject matter competence to obtain a teaching credential. Deci- 
sions about requinng passage of some type of performance assessment 
wUl depend upon the relative priorities assigned to specific teaching 
sMls, as weJ as on assumptions about the degree to which these skills 
are liKely to be developed m the beginning years of teaching. 

0 Breadth of assessment Another issue concerns how to address the 
multiple grades, subjects/topics, and contexts which are covered by a 
specific teaching credential. The multiple subjects credential perhaps 
ofS?/ "lost difficult case, because it covers the broadest raSge 
ci? 1 ? -""^ ^'f ^° ^h^t degree should a range of grades, 

subjects/topics and contexts be sampled to provide sufficient assurance 
ot the competence of the teacher candidate to instruct effectively in all 
areas covered by the credential? How should a candidate's experience 
^chmg at specific topics to specific groups of students be utilized? 
What should be the relative degree of emphasis between breadth of 
samplirg and the depth of knowledge gained from experience? 

o FfenTjffity of iWjment System. Since a multi-stage, multi-year creden- 
tiaJmg system is envisioned, and since different teachers develop at 
different speeds, the extent of flexibility of an assessment system should 

°"'y candidates with scores near the passing 
threshold have to take more complex tests? Could candidates be al- 
lowed to compensate for weaknesses in some areas through strengths in 
other areas, providing that some minimal competence has been demon- 
strated m the weaker aisas? If so, what would be required of aB 
candidates and where should such minimal thresholds be placed? 

0 Coonlmation with professional development To what extent should 
credentiaimg decisions and professional development be coordinate d'> 
Ihe most expensive assessments also tend to provide t'^e best guidance 
tor traming activities to improve teaching. Several states provide pro- 

8.10 

i 76 



ERIC 



fessional development for candidates who fail their first performance 
assessment, providing they are rehired by their employing district The 
use of infonaadon from credentialing assessments for staff development 
might justify the greater expense of those assessments which provide 
more mformatioiL 

0 Interagenqr involvemeiit Another decision concemc the role of state, 
regional, and local agencies in the credentialing process. The adminis- 
tration of assessments could range from a highly centralized program 
with professional assessors selected and trained by the Commission on 
Teacher Credentialing to a program resembling the current system of 
credentialing teachers, in which assessments are regionally administered 
by institutions of higher education following guidelines established by 
the Commission. In Connecticut, teachers, teacher educators, and state 
agency staff who are trained by the state and provided release time by 
their employing organizations to administer the CCI. Other variations 
are possible. One example is that the CTC could develop and oversee 
the implementation of an assessment system through a decentralized 
network of agencies, including universities, county offices of education, 
and school districts. Options such as these need to be outlined, broadly 
discussed, and used to evaluate the various assessment options and 
designs. 

o Funding Another significant decision to be discussed concerns funding. 
Given that teaching is a relatively low-paying profession and that an 
increase in the supply of teachers is needed for the foreseeable future, 
it is unlikely that new teachers could be expected to bear the full cost 
of the new assessment approaches. Even apart from fees charged, new 
teachers would incur opportunity costs in terms of time spent preparing 
for and participating in the new assessments. If assessment approaches 
which lend themselves to centralized administration were chosen, then 
rural teachers who live far from assessment sites would not only spend 
greater amounts of time traveling to the assessment site, but would also 
incur additional expenses for travel and perhaps ovemignt lodging. 

While there is a great potential for performance-based assessments to highlight 
and strengthen the knowledge a-l skills that teachers should possess, it is not clear how 
to balance increasing standards and the increasing needs for teachers from diverse 
backgrounds who might be the least able to afl:brd more costly assessments. Options 
need to be outliiied which v*dll provide alternative ways of balancing these concerns. 

The choice of assessment approaches and the design of a system of teacher as- 
sessment are conditioned by a series of decisions concerning t' e purpose of the assess- 
ments. For this reason, we have not identified the one best assessment approach or the 
best design for an assessment system. Instead, we have summarized what was learned 
from the pilot testing of three specific assessment approaches, suggested issues that 
could be explored with the next round of developing and pilot testing assessment in- 
struments, and outlined issues which should be considered before selecting one or more 
assessment approaches for use in credentialing decisions. 



8.11 

177 



nJiS ^at opportunity during the next two years to use activities planned for 
SLhtiir* , knowledge, options, benefits and costs for strengthening teacher 
SL^fJlX " . % aitemative approaches that have been commis- 

sioned by tt5 state agencies for piloting in the second year of the project give promise 
of high validity with respect to representing authentic teaching behaviors. Second, the 
pilot testmg of a vanety o'. assessment approaches can contribute to the identification 
and discussion about the various ways in which these assessments might be used such 

J^^; ^ reflect state-of-the-art knowledge in the areas of curriculum, pedagogy, 
and the teaching of diverse student populations; (2) support and direct teacher prepara- 
in J i2n ^development programs by highlighting important and critical krxowledge 
and skills; (3) reflect the complexity of teaching, thus increasing its attractiveness and 
professionalization; ana (4) help attract, rather than discourage, the strongest and most 
diverse teacher candidates. FinaUy, information from the pilot tests can help develop 
alternative funding mechanisms that will not be unduly burdensome to teachers or local 
or state agencies. 



ERIC 



8.12 

178 



BroUOGRAPHY 



BIBUOGRAPHY 



Berliner, David. (1989). Implications of Studies of Expertise in Pedagogy for 

Teacher Education & Evpluation in New Directions for Teacher Assessment, 
invitational conference proceedings. Princeton, NJ: Educational Testing Service. 

Berliner, BethAnn, Mata, Susana, Za'.ies, Dan, Little, Judith Warren. (1987). 

Improving student teaching through clinical supenision. Volume Two: Supervi- 
sion and support through the eyes of student teachers and Srstyear teachers. 
San Francisco, CA: Far West Laboratory for Educational Research and Devel- 
opment. 

Berliner, BethAnn, Intili, JoAnn, Little, Judith Warren, Mata, Susana, Terr)', Patricia, 
Zalles, Dan. (1987). Preserving preservice teacher education through clinincal 
supervision of student teaching. Volume four. The university supervisor. 
Program impact and experience. San Francisco, CA: Far West Laboratory for 
Educational Research and Development, 

Borko, H. J1986). Clinical teacher education: The induction years. In James 

Hoffnian and Sara Edwards, eds., Reality and reform in clinical teacher educa- 
tion, (pp. 45-64). New York, NY: Random House. 

Boiko, Hilda, Lalik, Rosary, Livingston, Carol, Pecic, Kathleen, and Perry, Diana. 
(1986). Learning to tsacb m the induction year. Two case studies. Paper 
presented at the annual meeting of the American Educational Research Associa- 
tion. 

Boyer, Ernest L. (1983). High schook A report to the Carnegie Foundation for the 
advancement of teaching New York: Harper & Row. 

California State Department of Education. (1985). Mathematics Framework for CaU- 
forma Public Schools, Kindergarten ttsrou^ Grade Twelve. Sacramento: Cali- 
fornia State Department of Education. 

California State Department of Education. (1987). Science-Model Curriculum Guide, 
lundergarten through Grade B'gfit Sacramento: California State Department 
of Education. 

California State Deoartment of Education. (1988). English-Language Arts, Model 

Curricubnn Guide, Kindergarten through Grade Eight Sacramento: California 
State Department of Education. 

Califorma State Department of Education. (1988). History-Social Science Framework, 
Kindergarten through Grade Twelve. Sacramento: California State Department 
of Education. 

Clark, D.C., Smith, R.B., Newby, T.J., & Cook, V.A (1985). Perceived origins of teach 
mg behavior. Journal of Tettcber Education, 36(6), 49-53. 



B.l 



Er|c 180 



Gomez, Robert (1989). A report on teacher suddIy: Emolbnaents in mofes^nnai 
S^r o'eSSTg.'" "^^^^ ^''"^^^ Sacramento: Commission on 

Goodla^^ghn^I^_(1984). A place caBed school: Prospects for the future. New York: 

Grant, Carl and Zeichncr, Kenneth. (19811 Inservice support for first year teachers: 
t if °^ -^ouraa/ of Research and Develooment in Education, 

Holmes Group, Inc. (1986). Tomorrow's teachers: A report of The Holmes Group. 
East Lansmg, MI: The Holmes Group. ' 

Huling- Austin^ Leslie. (1988). A synthesis of research on teacher induction programs 

Leinhardt, Gaea. (1989). Math Lessons: A Contrast of Novice End Expert Compe- 
fence. Journal for Research m Mathematics Education, 2C\ 52-75. 

Lortie, Dan. (1975). Schoolteacher. Chicago: University of Chicago Press. 

KeSi^nte ^S-'^"^"' I"""'/oA.m, Little, Judith Warren, Stansbury, 
Kendyll. (1988) Fmal report Imimn.'^.ment of preservice teacher 

w^c?T^"^ c/imca/ 5upexvisioii ofstu^nt teachers. San Francisco, CA: 
Far West Laboratory for Educational Research and Development. 

McDonald F.J. (19801 Study of induction programs for beginning teachers. 

tnn Tf "S'* ^P^^^^°fbe&nnmg teachers: A crisi! m training Prince- 
ton, NJ: Educational Testmg Service. 

National Commission on Excellence in Education. (1983). A nation at risk- The 
miperatsve for educational reform. 

Odell, Sandra {1986). Induction support of new teachers: A functional approach. 
Journal of Teacher Education. 26-29. 

Shulmanj^ LS^andSykes,G. (1986). A national board for teaching? In search of a 
bold standard A report for the Task Force on Teaching as a Profession. x\ew 
York: Carnegie Corporation. 

Shulman Lee (1987) Knowledge and teaching: Foundations of the New Reform. 
Harvard Educational Review, 57, 1-22. 

Streifer, P. (1984). The Validation of Teaching Competencies in Connecticut (An 
unpublished Ph.D. thesis). ^ 

Varah, Leonard, Theune Warren and Parker, Linda. (1986). Beginning teachers: Smk 
or swim? Journal of Teacher Education, 30-34. 



B.2 



181 



Veenman, Simon. (1984). Perceived problems of beeinning teachers. Review ni" 
Educational Ressarcb, 54, 143-178 

Watkins, Richard. (1985). A practitioner review of tbs content validity and passing 
standards of the California Basic Educational Skills Test Sacramento: 
Commission on Teacher Credentialing. 

Wilson, Suzanne. (1988). Understanding Historical Understanding: Subject matter 
knowledge and tbe teaching of Teachers. A dissertation submitted to Stanford 
University. 

Wheeler, Pat, Hirabayashi, J.B., Maretinson, J., and Watkins, R.W. (1988). A stutfy on 
tbe appropriateness of Gfteen NTE specialty area tests for use in credentialing in 
tbe state ofCatiforrda. Emeryville, CA: Educational Testing Service. 

Wheeler, Pat, and Elias, P. (1983). California Basic Educational Skills Test. Field 
test and vaUdity study report Berkeley, CA: Educational Testing Service. 

Wise, Arthur, Darling-Hammond, Lin la, Berry, Barnett, Klein, Stephen P. (1987). 
Licensing teachers: Design for a teacher profession. Santa Monica, CA: The 
Rand Corporation. 



B.3 

182 



APPENDIX A: 
SSI-SM RELIABILITY 



183 



APPENDIX A: SSI-SM RELIABILITY 



Two types of reliability were examined: inter-rater consistency and internal con- 
sistency. Analyses wer^ conducted on a subset of the assessment sample consisting of 
the ten teachers whose data were available. Due io this very small sample size, these 
analyses can be interpreted as exploratory Investigations only. 



Item Aggregation 

Six groups of tests were created. Each group was based on a different level of 
aggregation of the 20 (two topics by two tasks by five indicators) items. Group 1 (Total 
rest) incorporated all 20 items inio one test Group 2 (Topic) separately incorporated 

,^^^^"^y"'"'^ic^^°'" items for e ch topic into two task level tests. Similarly, Group 
3 (Task) separately combined the ten topic-by-indicator items for each task into two task 
level tests. Group 4 (Topic by Task) incorporated the five indicator items into four 
topic-by-task level tests. Group 5 (Indicators) combined the four topic-by-task items into 
five indicator level tests. Finally, Group 6 (Topic by Task by Indicators) combined each 
pair of topic Items into ten task-by-indicator level tests. 



Inter-rater Consistent^ 

The primary analyses for evaluation of the inter-rater consistency of the tests are 
presented in Table 1, These consisted of calculating item means for each test separately 
lOr the first and second raters on that test (Rl, R2) and for the averages from the two 
sets of ratings (RS), Additionally, inter-rater correlation coefficients between raters are 
presented for each test. The differences between item means across raters are uniform- 
ly small for all levels of test aggregation, indicating that different raters were employing 
similar standards within items for all item aggregations. The inter-rater correlations are 
mostly moderate. The correlat:nn for the full test is ,60, For other tests the correlations 
cluster around this level, with those tests having fewer items showing more variation 
among correlations. 

As part of the pilot assessment administration, all pairs of raters were instructed 
to discuss their ratings for -each indicator after the initial rating was made and try to 
reach a consensus. Whether or not they were able to reach a consensus, they then each 
made the rating again after considering any new information derived from the discussion, 
ihis process resulted in a duplicate set of ratings for all teachers. The analyses were 
repeated on these consensus ratings, and are summarized in Table 2. Here the small 
between-rater differences become even smaller and in most cases disappear, and the 
inter-rater correlations, with one exception, approach unity for all tests. These results 
indicate that the raters were able to reach or move toward consensus in most ii stances 
as a result of their discussions, 

Because each task employed a different pair of raters, it was possible to explore 
the effect of reversing rater pairs in the construction of those tests which aggregated 
across tasks (groups 1, 2, and 5), As shov n in Table 3, these reversals had little 
effect on item means, but substantially increased the inter-rater correlations. This 

A.1 



m 



TABLE 1 



MEAN FTEM RATINGS, COEFHCIENT ALPHAS, AND INTER-HATER 
CORRELATIONS FOR TESTS BASED ON INDICATORS 



MEAN ITEM 
RATING 



COEFFICIENT 
ALPHA 



TEST 



Ni 



RI 



R2 



RS 



RI 



R2 



RS 



INTR 

RATR 

CORR, 



Tl (AU items) 



20 



1.9 



LS 



L8 



.85 



.88 



.87 



.60 



T2A (Topic 1) 
T2B (Topic 2) 



10 
10 



2.1 
1.7 



2.0 
1.7 



2.0 
1.7 



.89 
.72 



. 9 
.79 



T3A (Taskl) 
T3B (Task 3) 



10 

10 



1.8 

2.0 



1.7 
2.0 



1.7 
2.0 



.90 
.59 



.92 
.83 



I'tA (Tpc i, Tsk 1) 
T4B (Tpc l,Tsk3) 
T4C (Tpc 2, Tsk 1) 
T4D (Tpc 2, Tsk 3) 



T5A (Indicator 1) 
T5B (Indicator 2) 
T5C (Indicator 3) 
T5D (Indicator 5) 
T5E (Indicator 6) 



.77 



.92 
.77 



.56 
.70 



.66 
.59 



5 


1.9 


1.6 


1.7 


.95 


.91 


.95 


.57 


5 


2.2 


2.3 


2.3 


.89 


.72 


.84 


.33 


5 


1.7 


1.7 


1.7 


.86 


.81 


.87 


.70 


5 


1.7 


1.6 


1.7 


.87 


.83 


.87 


.76 


4 


2.0 


2.0 


2.0 


.25 


.69 


.59 


.6- 


4 


2.0 


2.0 


2.0 


.38 


.51 


.41 


.56 


4 


2.1 


2.0 


2.1 


.25 


.20 


.19 


.39 


4 


1.5 


1.5 


1.5 


.55 


.64 


.66 


.69 


4 


1.7 


1.7 


1.7 


.31 


.78 


.54 


.32 



T6A (Tsk 1, Ind 1) 
T6B (Tsk 1, Ind 2) 
T6C (Tsk 1, Ind 3) 
T6D (Tsk 1, Ind 5) 
T6E (Tsk 1, Ind 1) 
T6F (Tsk 3, Ind 6) 
reo (Tsk 3, Ind 2) 
r6H (Tsk3,l0d3) 
T6I (Tsk 3, Ind 5) 
T6J (Tsk 3, Ind 6) 



2 


1.8 


1.8 


1.8 


.17 


.88 


.86 


.63 


2 


2.2 


2.0 


2.1 


.45 


.54 


.42 


.54 


2 


2.0 


1.8 


1.9 


.32 


.86 


.62 


.49 


2 


1.6 


1.5 


1.6 


.75 


.70 


.80 


.88 


2 


1.4 


1.4 


1.4 


.64 


.71 


.66 


.51 


2 


2.2 


2.2 


2.2 


.64 


.73 


.36 


.74 


2 


1.9 


1.9 


1.9 


.11 


.62 


.38 


.63 


2 


2.3 


2.3 


2.3 


.48 


.28 


0 


.48 


2 


1.5 


1.5 


1.5 


.16 


.60 


.34 


0 


2 


2.0 


2.' 


2.0 


0 


.71 


0 


.12 



A-2 



ERIC 



185 



TABLE 2 



MEAN ITEM RATINGS, COEFHCIENT ALPHAS, AND INTER-RATER 
CORRELATIONS FOR TESTS BASED ON CONSENSUS INDICATORS 









MEAN ITEM 




COEFFICIENT 










RATING 






ALPHA 




INTR 


















RATR 


TEST 


Ni 


Rl 


R2 


RS 


RI 


R2 


RS 


CORR 


Tt /All 

1 1 (Aii items) 


20 


1.9 


1.8 


1.9 


.85 


.86 


.86 


.99 


TJA (Topic 1) 


10 


2.0 


2.0 


2.0 


.85 


.86 


.86 


.98 


i Zb (A oplC I) 


10 


1.7 


1.7 


1.7 


.74 


.75 


.75 


1.00 


1 ^iasK 1) 


10 


1.7 


-.7 


1.7 


.91 


.90 


.91 


1.00 




10 


2.0 


2.0 


2.0 


.68 


.78 


.75 


.99 


T4/> (Tpc 1, Tsk 1) 


e 
J 


1.8 . 


i.8 


i.5 


.90 


.90 


.90 


1.00 


T4P rVnn t Tclf 7^ 


c 
J 


2.3 


2.3 


2.3 


.PI 


.82 


.83 


.95 


TdP /TlV* 9 Telr 1\ 


c 
D 


1.0 


1.6 


1.6 


.84 


.85 


.85 


.99 


T4n rTfv* 9 Tclr 7^ 


c 
J 


1.7 


1.7 


1.7 


.8', 


.82 


.82 


1.00 




A 
*t 


9 n 


2.0 


2.0 


.33 


.38 


.36 


.98 


Tffi ^Ton5*»«Qnc 9^ 


A 


z.u 


2.0 


2.0 


.38 


.37 


.38 


.99 


T5C ^'^onsensufi '^^ 


A 
*t 


9 n 


2.0 


2.0 


.37 


.37 


.37 


1.00 


T5D ^(!^on<5<*nQiiQ ^^ 


A 
*t 


1.0 


1.5 


1.5 


.52 


.42 


.48 


.99 


T5E ^Con«ienmii5 fi^ 


A 


1 7 
1. / 


1.7 


1.7 


OA 

.30 


.62 


.54 


.88 


1 

T6A (Tsk 1, Con 1) 


2 


1.7 


1.7 


1.7 


.58 


.58 


.58 


1 00 


T6B (Tsk 1, Coil 2) 


2 


2.2 


2.1 


2.1 


.40 


.42 


,42 


.95 


T6C (Tsk 1, Con 3) 


2 


1.8 


1.8 


1.8 


.77 


.77 


.77 


1.00 


T6D (Tsk 1, Con 5) 


2 


1.5 


1.5 


1.5 


.77 


,63 


.72 


.99 


T6E (Tski, Coa6) 


2 


1.4 


1.4 


1.4 


.86 


,86 


,36 


1.00 


T6F (Tsk 3, Con 1) 


2 


2.3 


2.2 


2.2 


.44 


.49 


.47 


.97 


T6G (Tsk 3, Con 2) 


2 


1.9 


1.9 


1.9 


.27 


.27 


.27 


1.00 


T6H (Tsk 3, Con 3) 


2 


2.3 


2.3 


2.3 


.43 


.43 


.43 


1.00 


T6I (Tsk 3, Con 5) 


2 


1.6 


1.6 


1.6 


.53 


.53 


.53 


1.00 


T6J (Tsk 3, Con 6) 


2 


2.0 


2.1 


2.0 


0 


0 


0 


.27 



A.3 



TABLES 



MEAN ITEM RATINGS, COEFFICENT ALPHAS, AND INTER-RATER 
CORRELATIONS FOR TESTS BASED ON INDICATORS, RATER COMBINATIONsSvERSED 



T6A (Tsk 1, Ind 1) 
T6B (Tsk 1, Ind 2) 
T6C (Tsk 1, Ind 3) 
T6D (Tsk 1, Ind 5) 
T6E (Tsk 2, Ind 6) 
T6F (Tsk 3, Ind 1) 
T6G (Tsk 3, Ind 2) 
T6H (Tsk 3. Ind 3) 
T6I (Tsk 3, Ind 5) 
T6J (Tsk3,Ind6) 



2 
2 
2 
2 
2 
2 
2 
2 
2 
2 



TEST 


Ni 


R12 


MEANH-EM 
RATING 

R21 


R12 


COEFFICIENT 
ALPHA 

R21 


INTR 

RATR 

CORR 


Tl (All items) 


20 


1.9 


1.8 


.85 


.82 


.80 


T2A (Topic 1) 
T2B (Topic 2) 


10 
10 


2.1 
1.7 


1.9 
1.7 


.89 
.67 


.76 
.81 


.62 
.72 


T3A (Taskl) 
T3B (Task 3) 


10 
10 












T4A (Tpc 1, Tsk 1) 
T4B (Tpc l,Tsk3) 
T4C (Tpc 2, Tsk 1) 
T4D (Tpc 2, Tsk 3) 


5 
5 
5 
5 












T5A (Indicator 1) 
IJB (Indicator 2) 
T5C (Indicators) 
T5D (Indicators) 
T5E (Indicator 6) 


4 
4 
4 
4 
4 


2.0 
2.0 
2.1 
1.6 
1.7 


2.0 
1.9 
2.0 
1.5 
1.7 


.39 
.36 
.18 
.71 
.63 


.39 
.40 
0 

.31 
0 


.84 
.69 
.57 
.76 
.80 



ERIC 



A4 

187 



suggests that the level of agreement between rater pairs is partially dependent upon the 
inuiviuual raters employed. With a larger sample size, however, it is possible that inter- 
rater correlations would be more similar between different rater pairs. 



Internal Consistency 

Coefficient alphas also are presented in Table 1, These are similarly moderate to 
high, mostly from the high seventies to the mid-nineties, for aggregations across all 
Items; across tasks and indicators; across topics and indicators; and across indicators. 
For aggregations across topic and task and across topic only, where fewer items are 
available toaggregate, the alphas are much smaller y.nd less stable, with many falling 
below ,5. These results might suggest that neither topic/task nor topic are not appropri- 
ate levels for aggregation, or alternatively, could result primarily from the small n\ To 
investigate this second possibility we used the Spearman Brown method to estimate 
alphas for tests of equal test length (n=10) at all aggregations. These are presented in 
1 able 4, Here the alphas for the task/level aggregation (group 5) still are unacceptably 
ow, but those for the topic level aggregation (group 6) appear to have risen to sufficient 
levels. Caution is required, however, in interpreting results of estimations for longer 
tests based on rm item tests as it is unlikely that eight additional items would be suffi- 
ciently similar to the first two to meet the assumptions of the adjustment procedure. 
This, coinbmed with the small sample, make these results very exploratory. With that 
warnmg in mind, it seems that these data Justify aggregations across all items; across 
tasks and indicators; across topics and indicators; and across indicators. Aggregation 
across topics and tasks is contraindicated, while aggregation across topics alone is mar- 
ginally supported, ^ 

Trblc 2 shows that consensus ratings had little effect on internal consistency for 
most tests; while Table 3 similarly indicates little effect on the alphas of reversing ratei 
pairs* 



Dichotomization 

A final set of analyses was performed to investigate the effect on inter-rater and 
mtemal consistencies of dichotomizing the items into sufficient/not-sufficient (1,0) rat- 
ings. These are presented in Table 5. Dichotomization seemed to have little effect on 
loter-rater consistency, except among the correlations for the two and four item tests. 
These tended to be lowered somewhat, with a lew increasing instead. The alphas also 
stayed similar for the larger tests, while both increasing and decreasing for the smaller 
tests Overall, these results suggest that the dichotomizea ratings and the original four 
point ratings are equivalently reliable. 



A.5 

188 



TABLE 4 



ADJUSTED COEFFICIENT ALPHAS BASED ON INDICATORS, ESTIMATED 
FOR EQUAL TEST LENGTHS OF 10 ITEMS 















ADJUSTED 








COEFHCIENT ALPHA 




COEFFICIENT 
















ALPHA 




TEST 


Ni 






KS 


NI 


Rl 


R2 


RS 


Tl (AUiteias) 


20 




00 
.00 


.0/ 


10 




.79 


.77 


T2A (Topic 1) 


10 


AO 


70 

. /y 


oo 
.00 


10 


QO 

.oy 


.79 


.88 


T2B (Topic 2) 


10 


. /Z 


. /y 


nn 

.11 


10 


77 


. /y 


.77 


T3A (Task!) 


10 


.90 


.92 


.92 


10 


on 
.yu 


.92 


.92 


T3B (Task 3) 


10 






. / / 


10 


so 
. jy 


.oJ 


.77 


T4A (Tpc l,Tsk 1) 


S 




01 

.y 1 


0^ 

.yj 


10 


07 


0^ 

.yj 


.97 


T4B (Tpc 1, Tsk 3) 


5 


.89 


79 


ft/1 

.0** 


10 


04 


ft/1 


.91 


T4C (Tpc 2, Tsk 1) 


S 


.86 


.01 


.0/ 


10 


09 


on 
.yu 


.93 


T4D (Tpc 2, Tsk 3) 


S 


.87 


• 0 J 




10 


01 
. 7 J 


01 

.y I 


.93 


T5A (Indicator I) 


4 


.2S 






10 


.45 


ftS 

. OJ 


.78 


T5B (Indicator 2) 


4 


.38 


.51 


41 


10 


.61 


.72 


.63 


T5C (Indicator 3) 


4 


.2S 


.20 


.19 


10 


.45 


.38 


.37 


T5D (Indicators) 


4 


.55 


.64 


.66 


10 


.75 


.82 


.83 


T5E (Indicator 6) 


4 


.31 


.77 


.54 


10 


.53 


.89 


.75 


lOA (isK i, ma 1) 


2 


.17 


.88 


.86 


10 


.51 


.97 


.97 


T6B (Tsk 1, Ind2) 


2 


.45 


.54 


.42 


10 


.80 


.85 


.78 


T6C (Tskl, Ind3) 


2 


.32 


.86 


.62 


10 


.70 


.97 


.89 


T6D (Tskl, IndS) 


2 


.75 


.70 


.80 


10 


.94 


.92 


.95 


T6E (Tskl,Ind6) 


2 


.64 


.71 


.66 


10 


.90 


.92 


.91 


T6F (Tsk3, Indl) 


2 


.64 


. ^ 


.36 


10 


.90 


.93 


.74 


T6G (Tsk3,Ind2) 


2 


.12 


.62 


.38 


10 


.41 


.89 


.75 


T6H (Tsk3,Ind3) 


2 


.48 


.28 


0 


10 


.82 


.66 


0 


T6I (Tsk 3, IndS) 


2 


.16 


.60 


.34 


10 


.49 


.88 


.72 


T6J (Tsk3, Ind6) 


2 


0 


.71 


0 


10 


0 


.92 


0 



ERIC 



A,6 



TABLES 



MEAN ITEM P- VALUES, COEFFICIENT .ALPHAS, AND INTER-RATER 
CORRELATIONS FOR TESTS BASED ON DICHOTOMIZED INDICATORS 









MEAN ITEM 




COEFHCIENT 










P VALUES 






ALPHA 




INTR 


















RATR 


TEST 


Ni 


Rl 


R2 


Rij 


Rl 


R2 


RS 


CORR 


Tl (All items) 


20 


.23 


.20 


.22 


.86 


.88 


.88 


.53 


T2A (Topic 1) 


10 


.27 


.25 


.26 


.88 


.83 


.89 


.56 


T2B (Topic 2) 


10 


.20 


.15 


.18 


.67 


.68 


.67 


.53 


T3A (Task 1) 


10 


.23 


.13 


.18 


.91 


.84 


.90 


.53 


T3B (Task 3) 


10 


.24 


.27 


.26 


.59 


.82 


.76 


.57 


T4A (Tpc 1, Tsk 1) 


s 


.26 


.12 


.19 


.97 


.96 


.98 


.56 


T4B (Tpc 1, Tsk 3) 


5 


.28 


.38 


.33 


.86 


.70 


.81 


.38 


T4C (Tpc 2, Tsk 1) 


S 


.20 


.14 


.17 


.91 


.13 


.81 


.61 


T4D (Tpc 2, Tsk 3) 


s 


.20 


.16 


.18 


.75 


.82 


.82 


.82 


T5A (Indicator 1) 


4 


.25 


.22 


.24 


.51 


.73 


.67 


.45 


T5B (Indicator 2) 


4 


.25 


.20 


.23 


.35 


.32 


.24 


.34 


T5C (Indicator 3) 


4 


.35 


.32 


.34 


0 


.51 


.32 


.48 


T5D (Indicators) 


4 


.13 


.10 


.11 


.81 


.34 


.77 


.98 


TSE (Indicator 6) 


4 


.20 


.15 


.18 


.32 


.75 


.49 


.40 


T6A (Tsk 1, Ind 1) 


2 


.20 


.15 


.18 


.55 


.78 


.93 


.42 


T6B (Tsk 1, Ind 2) 


2 


.30 


.15 


.23 


.09 


0 


0 


.39 


T6C (Tskl, Ind 3) 


2 


.30 


.20 


.25 


.09 


.64 


.53 


.36 


r6D (Tsk 1, Ind S) 


2 


.20 


.10 


.15 


.64 


0 


.46 


.83 


T6E (Tskl, Ind 6) 


2 


.15 


.05 


.10 


.78 


? 


.44 


.36 


T6F (Tsk 3, Ind 1) 


2 


.30 


.30 


.30 


.09 


.69 


.55 


.64 


T6G (Tsk 3, lad 2) 


2 


.20 


.25 


.23 


0 


.53 


.09 


.30 


T6H (Tsk 3, Ind 3) 


2 


.40 


.45 


.43 


0 


.16 


.00 


.67 


T6I (Tsk 3, Ind 5) 


2 


.05 


.10 


.08 


? 


? 


? 


.67 


T6J (Tsk 3, Ind 6) 


2 


.25 


.25 


.25 


0 


.53 


0 


.15 



A.7 



APPENDIX B: 
SSI-EM SCORING MATERIALS 




APPENDIX B: 
SSI-EM SCX)RING MATERIALS 



Candidate: , Scored by: Score: 



Scoring Form 
Lesson Planning: Fractions 



I. Components of the Legfion 

a) The lesson on simplifying fractions 
1. student activity 

2. more than one representation of the content 

3. candidate's mathematical accuracy 

4. development of major idea: simplifying fractions 

b) The 3 lesson sequence 
5. emphasis on factoring 

6. flow of the three lesson sequence 

7. amount of practice in simplifying fractions 

(Also consider the teacher's response to the question 
about the homework that s/he would assigr.ed.) 

c) Beginning of the lesson 

8. cleai* introduction to simplifying fractions 

9. ability to motivate students 

d) Important features 

10. factoring or greatest common factor 

1 1. what simplifying means 

12. one other idea related to simplifying fractions 

e) Difficult features 

— — 13. knowing when the simplification is complete 
14. finding factors or greatest common factor 



II. Section Three: STUDENT? 

a) Prior student knowledge 

15. division, general concept effractions, 

16. factoring, and/or equivalent fractions 

17. (3 out of this list of 4) 

B,l 

.192 



TTT VTrJMFT«PES 

a) 4/20= 1/5 

18. appropriateness of response to student 

19. use of alternative representation(s) 

b) 8/18 divided by 4/4 

20. appropriateness of response to student 



B.2 

193 



TS: FRACTIONS p.l 

Candidate Scorer Score 

Topic Sequencing: Fractions 
SCORING FORM 

Based on the <;a&didateis responses to the questions in parts A 
through D rate the following categories: 

RECORD THl CANDIDATSia SORTt 



* Candidate is able to accurately define: 

1. FRS 

. 2. LCM 

3. TEA & TFO 

4. CM 

5. the other 12 concepts 

>« Candidate accurately perceives tne significance of to 

the overall topic of fractions: 

6. LCM 

7. CM 



8. The candidate makes appropriate analogies (or 

accurate and understandable explanations) for the fraction 
concepts. 



-T T candidate provides an appropriate explanation 

for why conmon denominators are needed for addition and 
subtraction. 

— -r "^^^ candidate addresses conceptual understandings 

fractions ^^^^^^^^^^ discussing multiplication of 

11. Candidate, at some point in the interview, gives 

specific attention to the concept ol fractions. 



12. If the candidate chooses delete or add a topic, 



s/he provides either a pedagogical or mathematical justification. 
O B.3 

ERIC 194 



TS: FRACTIONS p. 2 



Based on th« oaadidatc's rasponsat to thm quaationa in part B 
rata- tha following quastionst 

RECORD THE 0£MDZDATB*8 SORT: 



13. Candidate provides a good rationale for their 

distinction of difficult concepts. 



— 14. SUBTRACTION OF MIXED NUMBERS WITH REGROUPING 

(SMR) is among the top two most difficult topics. 



B.4 



Candidate Seojfer 



score 



Topic Sequencing: Ratios 
SCORING FORM 



S™«K"n^\*^*!!!?^***^*'* responses to the questions in parts A 
through D rate the following categories: 

RECOJaD THE CANDIDATE'S SORT: 



* Candidate is able to accurately define: 

1. FRACTIONS AS A REGION/SET (FRS) 

2. RATIO (RA) 

_ 3. PROPORTION (PR) 

4. the other 14 concepts 



* Candidate accurately perceives the significance of to 

the overall topic of ratios: 

_ 5. COMPARISON OF NUMBERS (CN) 

. 6. SCALE DRAWINGS (SD) 

7. FINDING THE PERCENT OF A NUMBER (PN) 



-- -^ ■ ^* Candidate understands the relationship between 
proportions and equal ratios. 

A^^^^nr.,*.^ ^ I' candidate makes appropriate analogies (or 

accurate and understandable explanations) for the ratio concepts. 

T ^^n^fir. »4.^°'.>.4^^"?^'^^5^^ P°i"^ the card sort, gives 

specific attention to the concept of ratios. 

c/>.^ rs^— i^' " °2'n<iidate chooses to delate or add a topic, 
s/ha provides either a pedagogical or mathematical justification. 



B.5 



ERIC 



TS: Ratios p. 

RECORD TH E CANnXT yy^^. ^ ^ i g q/^p^ 



B.6 

i.97 



Candidate 



Scorer 



score 



Scoring Form 
Shortcuts 



A. Identifies the shortcut's strengths. (onlv for 
yes answer to #i) ^ 



B. Identifies the shortcut's limitations. 



C. The candidate provides a suitable justification for 
teaching or not teaching the shortcut. 



D. Candidate describes appropriate ways to facilitate 
proper use of this method. 



E. Candidate describes good alternatiVijs/complements 
for teaching the saue idea that is incorporated in 
the shortcut. 



F. The candidate provides a mathematical rationale 
for why the shortcut does or does not work. 



G. The candidate properly identifies whether or not 
the shortcut always works. 



H. The candidate properly identifies the mathematical 
concepts that are embedded in the shortcut. 



AVERAGE SCORE FOR THE 'Gozinta' Method 

B.7 



ERIC ^'^^ 



Scoring Porn: shortcuts 

(Page 2) 



ERIC 



A. Identifies the shortcut's strengths. (only for 
yes answer to #1) ^anj.y ror 



B. Identifies the shortcut's limitations. 



^* °??<ii«iate provides a suitable justification for 

teaching or not teaching che shortcut. 



D. Candidate describes appropriate ways to facilitate 
proper use of this method. ^«»cij.iT:aT:e 



^* '^?Sr^?eachfnS°fi^*'' ^ood alternatives/complements 
^he shorScSt.^ "^^^"^ incorporated in 



^* ^for^SSv^??!® S''!^^^!^ * mathematical rationale 
for why the shortcut does or does not work. 



^' '^ttJ^t^m^'^t Pf°Perly identifies whether or not 
tne shortcut always works. 

"* ^onS!«^i^J!*.P''°P*''^y identifies the mathematical 
concepts that are embedded in the shortcut. 

AVERAGE SCORE FOR THE 'Gozinta' Method 
AVERAGE SCORE FOR THE ' 1-2-3 • Method 

Final Score for the shortcuts exercise 

B.8 



Candidate Scorer 



Score 



SCORING FORM 
Lesson Planning: Ratios 



— Components of the 

a) The lesson percents and fractions 
1. student activity 

__ 2. candidate's mathematical accuracy 

3. development of major idea(s) : %s and fractions 

— 4. candidate's discussion of content attends to both 

procedures and conception 

b) The 3 lesson sequence 

5. emphasis on the mechanisms of conversion 

6. flow of the three lesson sequence 

^ — — 7. amount of student practice (Also consider the 

teacher's response to the question about the homework that s/he 
would assign.) 

c) Beginning of the lesson 

8. clear introduction 

. 9. ability to motivate students 

d) Important/difficult features 

; 10. simplification of fractions to lowest terms 

11. percent signifies an amount out of 100 

_ 12. The same number can be expressed as both a 

fraction and a percent. Percents can be equal to fractions - 
even though they look different. 

Ill — Section T hreat STrmgNTS 

a) Prior student knowledge 

— 13. division, fractions, percents, 

_ 14. factciring, equivalent fractions, 

15. and/or simplest form 

— 16. (4 out of this list of 6) 

III. VrGMBTTOa 

a) 2/1 - 20Q\. student tuinjcm th^t 1 Q 0» la i^^q ^rt r-r--nr 

_ 17. appropriateness for students 

b) student convarta 2/50 to 2> 

18. appropriateness for students 

g) Student can't begin to «anV*r ^ a/go Inf-^^ a fr;^ c^^nn 

_ _ 19. appropriateness for students 



B,9 



APPENDIX G 

ELEMENTARY EDUCATION EXAM CONSTRUCT VAUDITY 

^ ... ^ata available for analysis were summary statistics from the Spring, 1989 
California pdot test. These consisted of mean item scores (mean p-values) within con- 
tent area subtests for 462 of the 480 teachers. From these means were calculated de- 
scnptive statistics (means, standard deviations, and minimum and maximum values) for 
fo. y group of teachers, as well as for breakdowns by undergraduate major, student 
status, ethnicity, and gender. Correlations within the full group also were provided 

S'sAVver^^^^^^ ^"^^^^^ ^"'^ ^^^^ P°'"^ ^-^^ing 

c...», 1^ u-vl?^ analyses cannot be used to address important technical issues 

^^rli I • J^'j'^b'^^ty, Item or subset bias, or content validity. They do offer, however, a 
£^ibjecrrrea°Lb°tests^^^^'°" construct validity of the test as a whole and of the 

Correlations 

■.u correlations among the subtests ranged from .09 (not significant) for Math 
Thl?/^^' ^w^"'"'' '° 4' (P<'^^) for Math with Pedagogy. Thf mediai between- 
m Wh rpJ 'V' ^ s^lbtests, except Other Subjects, correlated significantly 
(p<.Ul) wtn GPA No test correlated significantly with current teacher status (yes/no), 
mcludmg Pedagogy. SAT-Verbal correlated signMcantly with Social Studies and pedago- 
gy, but not with any of the other subtests including Unguage Arts. SAT-Math 
ScSnce ^ "^^^ Pedagogy, and also correlated significantly with Math and 

tiv.i„ loSf °f correlations suggest that overall competence level makes a rela- 

tively large contnbution to the teachers' performances across all of the subtests. For 
example, Math correlates well, as would be expected, with SAT-M and with Science, 
F^r^'p H^"^^^^*-" ^P"^ "V^^ Language Arts, Pedagogy, and Social Studies. 
S^t hj^hiv I'gnificaptly correlated with teacher status; instead, it correlates 

most highly with Math, LA, and SAT-M. This contraindicates the use of separate sub- 
test scores, and weakens the inurpretation of either subtest or ftill tesi scores as meas- 
ures of pedagogical capabilities. 

Mean Comparisont 

The Other Subjects and Science subtests were the easiest, with mean p-values 
across teachers of .74 and 71. The most difficult was Math, with a mean p-value of .66. 
standard deviations ranged between .13 and .19, so that the difference between the 
easiest and hardest subtest means was approximately one half of a standard deviation. 

The breakdowns by undergraduate major show little interaction between major 
and subtest content area in determining mean performance; i.e. the relative order of 
pertormance level is roughly the same across all of the five specific subtest areas, regard- 

C.1 



2{}2 



lioriM^sSrMoi^?^^^^^ most difficult for Educa- 

lish majiDrs also Scfy low l^oss aU suS^^^^ ^^y. Liberal am and Eng- 
and social studies majors alwavs score ^hLt^ Conversely, business, science. 

Math and Science subte1tsT?oKckn?^ SS^;t':;'\'"'"'' "^T"' ^'8^"^ °" 

major highest on PedagoS and Socll S?nS>i t^^^^^^ °1 Language Arts, and business 

latiinal Ividence again! t Smtmc Sliditv of^dLn -^""^^ '"I^P°" 
separate subtests. construct validity of pedagogical interpretations, or use of 

seniorslSel^ig^lJSow^h? sophomores, juniors, and 

not enrolled R above \tra"n;-^h1ife^^^ 

half to^e1Sndar?5e^?a?ioS^ ^ 

to fall between the two excent tw Acjor.? , ^"^^^sts. Asians and Hispan cs tend 

Social Sci7nrand r& ^ t^^^^^^^^^ 8'°;?P^°" and 

small, except for males scoriTno^iceah^ ^hnS^P °," ^^"''^^ differences are 

and slight below PedaeoJ^^ VLZ fl^r. °" ^"^"^ Social Studies 

test performana N^tSKm^e ^ represent group mean differences in 
bias. ""^P'® do not address the question of test 



Furtber Analyses 



t^^ct^MXc^^^^'^^'ir^Sr" "^."f *e individual 

to evaluate the intemaUoSe'^y^nhftSd'^^^^^^^^ ^P""^ ^ 



C2 

203 



FIGURE 1 



1.0 
0.9 
0.8 
0.7 
0.6 



Q P Value 0.5 



0.4 
0.3 
0.2 
0.1 



ELEMENTARY EDUCATION EXAM 
Sample: California PUot Test Analysis Sample, Social Sciences Majors (N=45) 



1 



MATHS 
Numberof Items = 86 



LAX 
109 



PEDX 
53 



SSX SCIENCEX OTHERSX 

41 36 45 



Er!c 



areiriS^^ ^^^'^^''^ P'."' '"^ ^^^"^^^^ ^^ans (over subject 

areas), asterisks represent minimum and maximum teachers' item means (over subject areas). 



205 



FIGURE 2 



1.0 
0.9 
0.8 
0.7 
0.6 



O P Value 0.5 



0.4 
0.3 
0.2 



0.1 



ELEMENTARY EDUCATION EXAM 
Sample: California Mot Test Analysis Sample. Science Majors (N=13) 



1 



^MTHS 
Number of Items = 86 



LAX 
109 



PEDX 
53 



SSX 
41 



SCIENCEX 
36 



OTHERSX 
45 



207 



ERIC 



Bars represent means (over teachers) plus and minus one standard deviation of item means (over subject 
areas); asterisks represent minimum and maximum teachers' item means (over subject areas). 



FIGURES 



1.0 
0.9 
0,8 
0.7 
0.6 



0 P Value 0.5 



0.4 
0.3 
0.2 
0.1 



ELEMENTARY EDUCATION EXAM 
Sample: Caiifomia PUot Test Analysis Sample, Uberal Arts Majors (N=283) 



MATHS 
Number of Items = 86 



LAX 
109 



PEDX 
53 



SSX 
41 



SCIENCEX 
36 



OTHERSX 
45 



208 



Bars represent means (over teachers) plus and minus one standard deviation of item means (over subject 
areas); asterisks represent minimum and maximum teachers' item means (over subject areas). 



FIGURE 4 



1.0 
0.9 
0.8 
0.7 
0.6 



O P Value 0.5 
0.4 



0.3 
0.2 
0.1 



210 



ERIC 



ELEMENTARY EDUCATION EXAM 
Sample: California Pilot Test Analysis Sample. Education Majors (N=20) 



1 



T 



1 



T 



MATHS 
Number of Items = 86 



LAX 
109 



PEDX 
53 



ssx 

41 



SCIENCEX 
36 



OTHERSX 
45 



211 



Bars represent means (over teachers) plus and minus one standard deviation of item means (over subject 
areas); asterisks represent minimum and maximum teachers* item means (over subject areas). 



DOCUMENT RESUME 



ED 323 198 



SP 032 588 



AUTHOR 
TITLE 

PUB DATE 
NOTE 



PUB TYPE 

EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



Ross, E. Wayne 

Teacher Empowerment and the Ideology of 

Professionalism. 

Apr 90 

13p.; Paper presented at the Annual Convention of the 
New York State Council for the Social Studies 
(Buffalo, NY, April 6, 1990), 

Speeches/Conference Papers (150) — Viewpoints (120) 
MFOl/PCOl Plus Postage. 

Critical Thinking; ^Decision Making; Elementary 
Secondary Educa.tion; * Ideology; ^Organizational 
Climate; Politics of Education; *Power Structure; 
*Prof essional Autonomy; ^Teacher influence 
* Empowerment 



ABSTRACT 

The rhetoric and results of efforts to empower and 
professionalize teachers are examined to gain insight into ways in 
which the language of educational reform functions in both 
maintaining and changing power relations. This critical analysis 
clarifies how the ways people communicate both influence and are 
influenced by the structures and forces of social institutions, .g., 
schools, universities, unions, and school boards. How the ideology of 
professionalism operates is illustrated by examining two realms of 
authority related to schooling: (1) organization-management authority 
over schools (characteristically, political and social); and (2) 
educational authority within the schools (substance matters, such as 
curriculum content, pedagogy, etc.). The analysis concurs with 
findings of other researchers that even relatively neutral statements 
reflect acts of valuation. It is concluded that the interests served 
in the process of professionalizing teaching may not include the 
interests of the teachers themselves. To further these interests, 
teachers will have to regain control over the curriculum as well as 
school organization issues and develop a much stronger voice in the 
production of Imowledge about teaching. (JD) 



* Reproductions supplied by EDRS are the best that can be made * 

* from the original document. * 



ERIC 



1 



OD 



CO 
CO 

Q 



Teacher Empowerment and the Ideology of Professionalism 

E. Wayne Ross 
Department of Educational Theory & Practice 
University at Albany 
State University of New York 



Education 113A 
1400 Washington Ave. 
Albany, NY 12222 
518-442-5068 



"PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



^^^^ 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ER C) " 



nfi^ " f- DEPARTMENT OF EOUCATtON 

OffK^e or Edu«iiK>n.l Research impj^ement 

^Jil* document has been reproduced as 

• Points of v»ew or opinions stated m th>$ docu- 

olS "•^^^"'y represent official 

UtRI position or policy 



to 

ERIC 



Paper presented at ths New York State Council for the Social 
Studies Annual Convention, April 6, 1930, Buffalo. 



RFST COPY AVAILABLE 



Teacher Empowerment and the Ideology of Professionalism 

Distinguishing fact from opinion has been often cited 
as a basic skill needed for effective work in social 
studies. Social studies methods books outline how teachers 
should use sources, such as newspapers, to help students 
develop the skill of distinguishing between those statements 
based on verifiable information (facts) and those statements 
about which reasonable people might differ (opinions) 
(Wesley and Wronski 1964; New York State Social Studies 
syllabus 1987). As one methods text stated, "the careful 
reader soon senses that he [ sic ] is often getting a mixture 
of facts and opinions. He soon learns to detect the 
qualitative adjectives and the emotionally charged words and 
to sense when the author is stating opinions and when he is 
sticking to the facts" (Wesley and Wronski 1964, 197). 

Unfortunately, as you already know, distinguishing 
between facts and opinions is .lot usually so simple as 
presented in this example. In Hunt and Metcalf's now 
classic methods text they note that: 

Careful analysis suggest that the distinction commonly 
made between judgments of fact and judgments of value 
is misleading. . .The usual distinction conveys the 
notion that judgments of fact are divorced from acts of 
evaluation; that they are merely true or false 
descriptions of a physical reality outside of the 
observer — objective, exact, and dependable; and that 



judgiaents of value refer to nothing existent or 
substantial. . .It is misleading to suppose that any such 
hard-and-fast distinction can be made between 
statements ... In one sense all statements ej,e 
evaluative. . .Even relatively neutral statements may 
reflect acts of valuation. .. It seems likely that all 
thought involves the making of valuations — continuous 
selection of what is important in relation to one's 
ends. (1968, p. 130) 

What would a careful analysis of current educational 
thought reveal about the valuations behind calls for reforms 
such as teacher empowerment and professionalism? By 
examining the rhetoric and results of efforts to empower and 
professionalize teachers, we might gain insight into how the 
language of educational reform functions in both maintaining 
and changing power relations. This type of critical 
analysis can help us better understand how the ways we 
communicate influence and are influenced by the structures 
and forces of social institutions, (such as schools, 
universities, unions, and school boards). It can also 
reveal these processes allowing people to become more 
conscious cf them and more able to resist and change them. 

The analysis might start with the following statement: 
"Efforts to achieve empowerment for teachers, such as shared 
decision-making in schools, have been positive steps toward 
a professional and autonomous role for teachers in schools." 
Is this statement a fact or an opinion? 



Answering this question will involved an investigation 
of the origins of our ideas about teacher professionalism 
and uncovering how these ideas operate to serve particular 
social, economic, and political interests — that is, 
uncovering the ideology of professionalism. I will attempt 
to illustrate how the ideology of professionalism operates 
by examining two realms of authority related to schooling: 
(a) ogranization — management authority over schools 
(characteristically political and social) and (b) 
educational authority within the schools (substance matters 
such as curriculum content, pedagogy, etc.)- I'll begin 
with the latter of these realms. 
Academic Knowledge and Curricular Control 

The recent history of teaching is a history of ever 
increasing state intervention in teaching and curriculum 
development (Apple 1986). In the 1950's and 1960's 
America's educational "crisis" was defined in relation to 
the scientific and ideological advances of the Soviet Union. 
The schools were defined as a tool of national power. The 
economic, ideological, and military struggle with the Soviet 
Union, therefore, hinged on setting the schools straight. 

As Michael Apple points out in his book Education and 
Power , during this particular era of reform there was 
"strong pressure from academics, capital, and the state to 
reinstitute academic disciplinary knowledge as the most 
'legitimate' content for the schools" (1986, p. 36). As we 
all know, the educational "crisis" of the 1950 's and 1960 's 



resulted in the production of a great number of curriculum 
programs intended for use in elementary and secondary 
schools. It is important to note that these programs were 
developed, for the most part, by individuals outside of the 
schools. 7'he focus was on producing curriculum materials 
that were academically rigorous, systematic and that left 
little room for teacher judgment in their implementation. 

In many of these curriculum programs (particularly 
those intended for use at the elementary level), everything 
a teacher needed was provided, with plans and activities 
prespecif ied. The cost of the curriculum development was 
subsidized by the government and the National Defense 
Education Act allowed schools to be reimbursed for 
purchasing the materials. The new curricula were attractive 
because they had been developed by the "experts" and the 
cost of purchasing the materials w.s low. Most schools 
purchased the curricula because it seemed illogical not to. 

If you are familiar with these curriculum projects 
(e.g., High School Geography Project, MACOS, etc.) you know 
that they did not have a lasting impact (if any) on the way 
social studies was taught in schools. Teachers resisted 
these curriculum innovations by teaching the "new math" and 
the "new social studies" in the same manner as the old math 
and social studies. 

The state's role in sponsoring changes in curriculum 
and teaching practice in the 1950 's and 1960 's is important, 
however, as an example of how attempts to rationalize 



education have lead to a means-ends argument that ultimately 
justifies a reduction in teachers' authority to make 
decisions regarding curriculum and pedagogy. Conformity and 
standardized practice rather than professionalism and 
autonomy are the result of such approaches to curricular 
reform. 

Our current educational "crisis" and proposals for 
fixing the schools in many ways are reflective of the what 
occurred 30 years ago. Japan has been substituted for 
Soviet Union as the "dark incentive" for restructuring the 
schools (Feinberg 1990) . The proposals presented in 
national reports such' as A Nation At Risk and The Twentieth 
Century Fund's Making the Grade once again focus on the 
schools as the key to maintaining America's economic and 
military superiority. As the National Commission puts it, 
"Education is one of the chief engines of a society's 

material well-being Citizens also know in their bones 

that the safety of the United States depends principally on 
the wit, skill, and spirit of the self-confident people, 
today and tomorrow" (p. 17) . 

What these reports (and more broadly the efforts of the 
New Right) represent is an attempt to "intervene 'on the 
terrain of ordinary, contradictory common-sense, ' to 
'interrupt, renovate, and transform in a more systematic 
direction' people's practical consciousness" (Apple, 1990, 
p. 38) . What has been accomplished is a translation of an 
economic doctrine into the language of experience, common- 



sense, and moral imperative; a language that leads to the 
loss of control and rationalization of teachers' work. 

An example of the current version of this argument may 
be helpful. Social studies teaching and curricula are seen 
as bland and non-substantive. What is lacking is a fullness 
of knowledge, an objective picture of world realities. The 
more rapid the pace of change in our world (the more 
culturally diverse the nation becomes) , the more critical it 
is for us to remember and understand the central ideas, 
events, people and works that have shaped "our" (white, 
middle class, male) society. The former ways of teaching 
and curricular control are neither powerful nor efficient 
enough for this situation. Teachers aren't sophisticated or 
knowledgeable enough, so we must call in a group of 
"nationally recognized scholars" to revamp the curriculum 
and to develop accountability systems to make certain that 
the new curricula actually reach the classrooms (e.g., 
increase in mandated testing at all levels — in New York 
State an increase from one to six state prepared social 
studies tests. 

Contradictory consequences can be seen in both past and 
current curriculum reform movements. Whether by the 
teacher-proof curricula of an earlier era, or by highly 
centralized curriculum change with extensive accountability 
mechanisms, such as the one in New York State, teachers have 
been systematically "freed" from making decisions in the 
realm of educational authority • By "freeing" teachers of 



8 



the responsibility for conceptualizing, planning, and 
evaluating the curricula they teach, these movements helped 
to legitimate new forms of control and greater state 
intervention in teaching and curriculum. Technical and 
industrial models (that have grown out of Taylor ism) have 
been used for systematic integration of testing, objectives, 
and curriculum; competency-based instruction, prepackaged 
curricula, etc. Models that leave little or no rocm for 
teachers to exercise autonomous professional judgment about 
curriculum or to define and enforce professional standards 
of practice. 

Intensifi cation. Professionalism, and Teaching 

The "reform" iueonanisms that have been briefly outlined 
here illustrate how the separation of conception from 
execution in teachers' work as had a deskilling/reskilling 
effect. When jobs are deskilled, the knowledge that was 
controlled and used by workers in carrying out their day to 
day lives on their jobs goes somewhere. In its place, new 
more routinized techniques are require to complete the job 
(reskilling) . 

In addition to affecting teachers' control of decisions 
about curriculum and pedagogy, this process also works to 
redefine the organization/management structure of schools. 
The process of deskilling/reskilling is one in which the 
control of the teaching (labor) process is changed. For 
example, skills that teachers have developed has a result of 
education and job experience are broken into discreet units 

ER?C 2 



and redefined into specialized jobs by management (e.g., 
curriculum conceptualization is centralized at the state 
level; evaluation is done by standardized tests; resource 
room teachers handle remediation; and students are organized 
by tracks for teaching) . The redefinition and specialization 
are done to increase efficiency and control of the labor 
process. As a result, teachers' control over timing, over 
defining appropriate practices and over criteria used to 
indicate acceptable performance is taken over by management 
personnel (who are usually separated from the context of the 
work). As Apple points out, "deskilling, then, often leads 
to the atrophy of valuable skills that workers possessed, 
since there is no longer any 'need' for them..." (1986, p. 
209) . 

The increased specialization and routinization of 
reskilled jobs is accompanied by intensification —that is, 
"more, quicker, faster." Aspects of intensification are 
increasingly found in schools dominated by prespecified 
curricula, repeated testing, and strict and red.uctive 
accountability systems (Apple 1986) . These procedures 
aflect the structure of teachers' work by increasing the 
amount of time spent on administrative matters and require 
them to rely even more heavily on ideas and processes 
provided by "experts." For example, increased time spent on 
test-taking skills, or drilling students on test items. As 
responsibility for creating one's own curriculum decreases. 



technical and management concerns become the foremost part 
of teachers' work. 

Shared or joint decision making, as it currently 
operates in schools, is one way in which the realm of 
teacher professionalism is strictly defined in order to 
place rational limits on areas of teacher involvement. For 
example Erlandson and Bifano (1987) state that, 

Shared decision making in the school does not mean 
indiscriminate involvement of teachers in all 
decisions • Their professionalism suggests that they 
are best involved in decisions relating to their 
expertise. (p. 34) 

By strictly redefining and controlling teachers' labor, 
the argument can be made that the degree of teachers' 
participation in decision making should increase only has 
the consequences of the decisions affect a narrowly defined 
"area of expertise." In other words, it is only in 
decisions of a technical nature that teachers have the most 
interest and the most expertise and should be involved (see 
Erlandson and Bifano, 1987). 

Shared decision making is then construed as a way of 
extending and enhancing administrative control over a wider 
range of decisional issues. Share decision making increases 
the involvement of teachers in limited areas of decision 
making, leaving intact and even enhancing th3 hierarchical 
structure of schools. 

10 



11 

It's paradoxical that a situation which has led to the 
slow erosion of teachers control over their jobs has been 
combined with the rhetoric of increased professionalism. 
Professionalism and increased responsibility go hand in 
hand, however, in this case teachers find themselves making 
more technical/management decisions, working longer hours, 
and having less control over the curricula they teach. 

So what's the verdict in our exercise to distinguish 
fact from opinion in the statement that: "Efforts to 
achieve empowerment for teachers, such as "shared decision- 
making" in schools, have been positive steps toward a 
professional and autonomous role for teachers in schools." 
This analysis suggests that Hunt and Metcalf were right. 
Even relatively neutral statements reflect acts of 
valuation. It is evident that our current conceptions of 
teacher professionalism and reform measures taken on the 
basis of these conceptions serve specific interests within 
education. My suggestion is that the interests served to 
this point in the process of "professionalizing" teaching 
may not include the teachers themselves. We must not 
confuse losses and victories. Teachers have made important 
advances toward autonomous professionalism, however it is 
important that increased control over predefined 
technical/managerial decisions not be equated with increased 
professionalism. To be truly autonomous professionals 
teachers will have to regain control over the curriculum as 
well as school organization issues and develop a much 



11 



stronger voice in the production of knowledge about 
teaching. 



12 



References 

Apple, M. W. 1986. Education and power > Boston: 

Routledge & Kegan Paul. 
Apple, M. W. 1990. The politics of common sense: 

Schooling, populism, and the New Right. In H. A. 

Giroux & P. McLaren (Eds.), Critical pedagogy, the 

state and cultural struggle (pp. 32-49) . Albany, NY: 

State University of New York Press. 
Erlandson, D. A. , & Bifano, S. L. 1987. Teacher 

empowerment: What research says to the principal. 

NASSP Bulletin . 71(503), 31-36. 
Feinberg, W. 1989. Fixing the schools: The ideological 

turn. In H. A. Giroux & p. McLaren (Eds.), Critical 

pedagogy, the state and cultural struggle (pp. 69-91) . 

Albany, NY: State University of New York Press. 
Hunt, M. P., & Metcalf, L. E. 1968. Teaching high school 

social studies: P roblems in reflective thinking and 

social u nderstanding . New York: Harper & Row. 
New York State Education Department. 1987. 9 & 10 Social 

Studies: Global studies . Albany, NY: Author. 
Wesley, E. B., & Wronski, S. P. 1964. Teaching social 

studies in high school (5th ed.). Boston: Heath. 



13 



